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© A computer system coordinates the handling of 
error codes and information describing errors in a 
commit procedure. The system supports a first re- 
source manager of a first type and a second re- 
source manager of a second type. An application is 
coupled to a sync point manager and initiates a two- 
phase commit procedure. The sync point manager is 
coupled to the first and second resource managers 
and coordinates the two-phase commit procedure 
involving the first and second resource managers. 
The sync point manager receives notification of a 
failure or failures relating to the first and second 
resource managers that prevent completion of the 



commit procedure and identification of the resource 
manager or resource managers associated with the 
failure or failures. The sync point manager sends to 
the application a failure notification after receipt of 
the notification of a failure or failures relating to 
either or both of the resource managers, and upon 
request, also sends to the application the identifica- 
tion of the resource manager or resource managers 
associated with the failure or failures. The sync point 
manager also receives cause of failure information 
for each failure, and sends the cause of failure 
information to the application upon request. 
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© Coordinated handling of error codes and information describing errors in a commit procedure. 



© A computer system coordinates the handling of 
error codes and information describing errors in a 
commit procedure. The system supports a first re- 
source manager of a first type and a second re- 
source manager of a second type. An application is 
coupled to a sync point manager and initiates a two- 
phase commit procedure. The sync point manager is 
coupled to the first and second resource managers 
and coordinates the two-phase commit procedure 
involving the first and second resource managers. 
The sync point manager receives notification of a 
failure or failures relating to the first and second 
resource managers that prevent completion of the 



commit procedure and identification of the resource 
manager or resource managers associated with the 
failure or failures. The sync point manager sends to 
the application a failure notification after receipt of 
the notification of a failure or failures relating to 
either or both of the resource managers, and upon 
request, also sends to the application the identifica- 
tion of the resource manager or resource managers 
associated with the failure or failures. The sync point 
manager also receives cause of failure information 
for each failure, and sends the cause of failure 
information to the application upon request. 
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BACKGROUND OF THE INVENTION 

The invention relates generally to computer 
operating systems, and deals more particularly with 
a computer operating system which coordinates 5 
the handling of error flags and information describ- 
ing errors in a commit procedure for heteroge- 
neous resources. 

The operating system of the present invention 
can be used in a network of computer systems. 10 
Each such computer system can comprise a cen- 
tral, host computer and a multiplicity of virtual 
machines or other types of execution environ- 
ments. The host computer for the virtual machines 
includes a system control program to schedule 75 
access by each virtual machine to a data processor 
of the host, and help to manage the resources of 
the host, including a large memory, such that each 
virtual machine appears to be a separate computer. 
Each virtual machine can also converse with the 20 
other virtual machines to send messages or files 
via the host. Each VM (trademark of IBM Corp. of 
Armonk, NY) virtual machine has its own CMS 
portion of the system control program to interact 
with (i.e., receive instructions from and provide 25 
prompts for) the user of the virtual machine 
("CMS" is a trademark of IBM Corp. of Armonk, 
NY). There may be resources such as shared file 
system (SFS) and shared SQL relational databases 
which are accessible by any user virtual machine 30 
and the host. 

Each such system is considered to be one real 
machine. It is common to interconnect two or more 
such real machines in a network, and transfer data 
via conversations between virtual machines of dif- 35 
ferent real machines. Such a transfer is made via 
communication facilities such as AVS Gateway and 
VTAM facilities ("AVS Gateway" and "VTAM" are 
trademarks of IBM Corp. of Armonk, NY). 

An application can change a database or file 40 
resource by first making a work request defining 
the changes. In response, provisional changes ac- 
cording to the work request are made in shadow 
files while the original database or file is un- 
changed. At this time, the shadow files are not 45 
valid. Then, the application can request that the 
changes be committed to validate the shadow file 
changes, and thereby, substitute the shadow file 
changes for the original file. A one-phase commit 
procedure can be utilized. The one-phase commit 50 
procedure consists of a command to commit the 
change of the resource as contained in the shadow 
file. When resources such as SFS or SQL re- 
sources are changed, the commits to the resources 
can be completed in separate one-phase commit 55 
procedures. In the vast majority of cases, ail re- 
sources will be committed in the separate proce- 
dures without error or interruption. However, if a 
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problem arises during any one-phase commit pro- 
cedure some of the separate commits may have 
completed while others have not, causing incon- 
sistencies. The cost of rebuilding non-critical re- 
sources after the problem may be tolerable in view 
of the efficiency of the one-phase commit proce- 
dure. 

However, a two-phase commit procedure is 
required to protect critical resources and critical 
conversations. For example, assume a first per- 
son's checking account is represented in a first 
database and a second person's savings account is 
represented in a second database. If the first per- 
son writes a check to the second person and the 
second person deposits the check in his/her sav- 
ings account, the two-phase commit procedure en- 
sures that if the first person's checking account is 
debited then the second person's savings account 
is credited or else neither account is changed. The 
checking and savings accounts are considered pro- 
tected, critical resources because it is very impor- 
tant that data transfers involving the checking and 
savings accounts be handled reliably. An applica- 
tion program can initiate the two-phase commit 
procedure with a single command, which proce- 
dure consists of the following steps, or phases: 

(1) During a prepare phase, each participant 
(debit and credit) resource is polled by the sync 
point manager to determine if the resource is 
ready to commit all changes. Each resource 
promises to complete the resource update if all 
resources successfully complete the prepare 
phase i.e. are ready to be updated. 

(2) During a commit phase, the sync point man- 
ager directs all resources to finalize the updates 
or back them out if any resource could not 
complete the prepare phase successfully. 

If there is an error or failure during a two-phase 
commit procedure, it is important to advise the 
application of the nature of the problem so that it 
can assist in correcting the problem or taking other 
action. For example, if a synchronization point can- 
not be obtained because a participating file is 
open, then it is preferable to advise the application 
of the state of the file so the application can 
proceed with another operation and request a com- 
mit for this file later. Also, if a synchronization point 
is requested for a protected conversation, and the 
protected conversation is in an improper state to 
commit, then the application can endeavor to 
change the state of the protected conversation and 
subsequently request a synchronization point. 
Thus, it is important that the application know 
which of the participating resources failed and have 
detailed information describing the nature of the 
error. 

As noted above, different types of resources 
can be accessed by an application. Different types 
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of managers of the different resources can have 
different protocols for responding to failures occur- 
ring during a synchronization point. In the case of 
the prior art VM Shared File System, the applica- 
tion can provide the address of a location in the 5 
application execution environment to store a copy 
of the error information. If an error arises, then the 
error information is automatically transmitted from 
the resource manager to this location. The informa- 
tion includes one or more error descriptions and w 
identifies the resource which failed. In this exam- 
ple, the application is familiar with the format of the 
information furnished by the resource. 

In the prior art SQL/DS Relational Data Base 
System, when an error occurs during a work re- 15 
quest involving the SQL/DS system, a manager 
within the SQL/DS system detects the error and 
transmits detailed error information to a memory 
space known to the distributed application's envi- 
ronment. Next, the application can read and ana- 20 
lyze the error information from its memory referen- 
ces above. Other resources and resource man- 
agers exist with other, different protocols and new 
resource managers will have their own protocols 
optimized for their own purposes. 25 

Also, in the prior art, if an application initiates a 
protected conversation to a communications part- . 
ner, and the protected conversation subsequently 
fails due, for example, to a loss of communication, 
the VTAM communications facility detects the fail- 30 
ure, and transmits an error return code to the 
application. The error return code indicates the 
existence of a failure and the cause of failure. The 
application program knows which partner failed be- 
cause this prior art system supported commands to 35 
a single partner only. 

According to the prior art also, the resources 
and protected conversations are treated indepen- 
dently in so far as error return codes and detailed 
error information. 40 

Accordingly, a general object of the present 
invention is to provide an operating system which 
coordinates the collection of information from het- 
erogeneous resources describing errors in a syn- 
chronization point. 45 

Another object of the present invention is to 
provide an operating system of the foregoing type 
which coordinates the distribution of the detailed 
information, especially the resource type and the 
name, or identification, of any failing resource, to 50 
an initiating distributed application. 

Another object of the present invention is to 
provide an operating system of the foregoing type 
which does not affect system performance if no 
errors occur. 5 5 

Another object of the present invention is to 
provide operating systems of the foregoing types 
which are compatible with the architecture and 



design of existing resource managers. 

Still another object of the present invention is 
to provide an operating system of the foregoing 
type which permits prior art applications that ac- 
cess the VM Shared File System and SQL/DS 
system described above, and other existing re- 
source managers, to run unchanged on the operat- 
ing system defined by the present invention. 

These and other objects are solved in advanta- 
geous manner by applying the features laid down 
in the independent claims. Further developments of 
these basic solutions are contained in the related 
dependent subclaims. 

SUMMARY OF THE INVENTION 

The invention resides in a computer system 
which coordinates the handling of error codes and 
information describing errors in a commit proce- 
dure. The system supports a first resource man- 
ager of a first type and a second resource manager 
of a second type. An application is coupled to a 
sync point manager and initiates a commit proce- 
dure. The sync point manager is coupled to the 
first and second resource managers and coordi- 
nates the two-phase commit procedure involving 
the first and second resource managers. The sync 
point manager receives notification of a failure or 
failures relating to the first and second resource 
managers that prevent completion of the commit 
procedure and identification of the resource man- 
ager or resource managers associated with the 
failure or failures. The sync point manager sends to 
the application a failure notification after receipt of 
the notification of a failure or failures relating to 
either or both of the resource managers, and upon 
request, also sends to the application the identifica- 
tion of the resource manager or resource managers 
associated with the failure or failures. The sync 
point manager also receives cause of failure in- 
formation for each failure, and sends the cause of 
failure information to the application upon request. 
The cause of failure information originally sent by 
one of the resource managers may not be in a 
form which the application can understand, and 
upon request by the application, the resource man- 
ager can access the cause of failure information 
from the sync point manager, decipher it, and 
transmit it to the application. 

BRIEF DESCRIPTION OF THE FIGURES 

FIG. 1 is a block diagram of a computer sys- 
tem which incorporates all commit and recovery 
functions in each execution environment, according 
to the prior art. 

FIG. 2 is a block diagram of a computer net- 
work including two interconnected computer sys- 
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terns according to the present invention. Each of 
the systems supports multiple execution environ- 
ments with a common recovery facility and log. 

FIG. 3 is a flowchart of a two-phase commit 
procedure for resources used by an application 
running in an execution environment of FIG. 2. 

FIG. 4 is a flowchart of recovery processing 
that is implemented when an interruption occurs 
during the two-phase commit procedure described 
in FIG. 3. 

FIGS. 5 (A) and 5 (B) are a flowchart of a two- 
phase commit procedure for resources used by 
partner applications running in two distributed ap- 
plication environments connected by a protected 
conversation supporting sync point facilities of FIG. 
2. 

FIG. 6 is a block diagram illustrating plural 
work units defining different commit scopes within 
a single application environment of FIG. 2, and a 
commit scope transversing more than one system 
of FIG. 2. 

FIG. 7 is a flowchart illustrating the use of local 
work units and a global logical unit of work by one 
application environment of FIG. 2 to define the 
scope of and facilitate commit processing. 

FIG. 8 is a flowchart illustrating the use of local 
work units and the global logical unit of work of 
FIG. 7 by another related application environment 
of FIG. 2 to define the scope of and facilitate 
commit processing. 

FIG. 9 is a timing diagram of a protected 
conversation in the global logical unit of work of 
FIGS. 7 and 8. 

FIG. 10 is a block diagram that illustrates auto- 
matic and generic registration of resources within 
the systems of FIG. 2. 

FIG. 11 is a flowchart illustrating a procedure 
for registering resources in a sync point manager 
of FIG. 6 for a suitable type of commit procedure 
and the steps of the commit procedure. 

FIG. 12 is a block diagram illustrating registra- 
tion on a work unit basis within the systems of FIG. 
2. 

FIG. 13 is time flow diagram of bank transac- 
tions illustrating registration on a work unit basis. 

FIG. 14 is a flowchart illustrating a procedure 
for registering resources, changing registration in- 
formation for resources and unregistering re- 
sources in the sync point manager, 

FIG. 15 is a flowchart illustrating the procedure 
used by resource adapters, protected conversation 
adapters, and the sync point manager to unregister 
resources. 

FIG. 16 is a flowchart illustrating processing by 
the sync point manager in response to a sync point 
request, and optimizations by the sync point man- 
ager in selecting one-phase or two-phase commit 
procedures. 



FIG. 17 is a flowchart illustrating the two-phase 
commit procedure. 

FIG. 18 is a flow diagram illustrating three 
distributed application programs participating in a 
5 two-phase commit procedure. 

FIG. 19 is a block diagram illustrating the com- 
ponents and procedure for exchanging log names 
to support recovery of a failed commit procedure 
when a protected conversation is made between an 
w application in one system and a partner application 
in another system of FIG. 2. 

FIG. 20 (A) and 20 (B) are flowcharts of com- 
munications facility processing associated with FIG. 
19 for an initial event and a subsequent conversa- 
is tion event, respectively. 

FIG. 21 is a flowchart of recovery facility pro- 
cessing associated with FIG. 19 that results when a 
local communications facility requests that the re- 
covery facility exchange log names for a path. 
20 FIG. 22 is a flowchart of recovery facility pro- 
cessing associated with FIG. 19 that results from 
receiving an exchange of log names request from 
another recovery facility. 

FIG. 23 is a block diagram illustrating the com- 
25 ponents and procedure for exchanging log names 
with a local resource manager in a section of FIG. 
2. 

FIG. 24 is a block diagram illustrating the com- 
ponents and procedure for exchanging log names 
30 using a system of FIG. 2 and a remote resource 
manager. 

FIG. 25 is a block diagram illustrating the con- 
tents of a recovery facility of FIG. 2. 

FIG. 26 is a flowchart illustrating the proce- 
35 dures if an application makes a work request to the 
resource adapter. 

FIG. 27 is a flowchart illustrating the processing 
for exchange of log names between a participating 
resource manager and the recovery facility. 
40 FIG. 28 is a block diagram illustrating portabil- 
ity of the sync point log and capability for activat- 
ing back up recovery facilities. 

FIG. 29 is a block diagram which illustrates 
participation by the resource adapter and sync 
45 point manager of FIG. 2 in passing an error flag 
and information that defines a problem in a commit 
procedure to an application program. 

FIG. 30 is a flowchart illustrating a procedure 
for using the components of FIG. 29 to pass the 
50 error information to the application program. 

FIG. 31 is a control block structure for sharing 
the pages used by error blocks associated with 
FIG. 29 in order to reduce system working storage. 
FIG. 32 is a block diagram of components of 
55 FIG. 2 that participate in the generation and man- 
agement of the error flags and information of FIG. 
29. 

FIG. 33 is a block diagram illustrating three 
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systems including commit cycles that encompass 
more than one of the systems commit scopes 
incorporating resource managers that reside in the 
same and different systems as an initiating applica- 
tion and communications paths employed during 5 
commit processing as well as paths used for sync 
point recovery processing. 

FIG. 34 is a block diagram illustrating three 
participating application and application environ- 
ments from FIG. 33 and the resource managers 10 
that they employ, forming a tree of sync point 
participants. 

FIG. 35 is a high level flowchart illustrating the 
recovery facility procedures for pre-sync point 
agreements and procedures for recovery from a 75 
sync point failure. 

FIG. 36 is a flowchart illustrating in more detail 
the recovery facility procedures for recovery from a 
sync point failure. 

FIG. 37 is a block diagram illustrating the con- 20 
tents of logs 72 of FIG. 2 and control structures 
required to control the procedures represented by 
FIG. 35. 

FIG. 38 is a flowchart providing detail for FIG. 
35, steps 299 and 300. 25 
FIG. 39 is a flowchart providing detail for FIG. 

35, steps 301 and 302. 

FIG. 40 is a flowchart providing detail for FIG. 

36, step 311. 

FIG. 41 is a flowchart providing detail for FIG. 30 
36, step 312. 

FIG. 42 is a flowchart providing detail for FIG. 
36, step 313. 

FIG. 43 is a flowchart providing detail for FIG. 
36. step 314. 35 

FIG. 44 is a flowchart providing detail for FIG. 
36, step 315. 

FIG. 45 is a flowchart providing detail for FIG. 
36, step 304. 

FIG. 46 is a flowchart providing detail for FIG. 40 
36, step 317. 

FIG. 47 is a flowchart providing detail for FIG. 
36, step 318. 

FIG. 48 is a flowchart providing detail for FIG. 
36, step 319. 45 

FIG. 49 is a flowchart providing detail for FIG. 
36, step 306. 

FIGS. 50 (A) and 50 (B) are block diagrams 
which illustrate application 56A and application 56D 
requesting asynchronous ^synchronization should so 
an error occur during sync point processing. 

FIG. 51 is a flow graph illustrating the steps of 
the asynchronous, resynchronization-in-progress 
process involving an additional system 50C. 

FIG. 52 is a flow graph illustrating the steps of 55 
the asynchronous, resynchronization-in-progress 
process involving a failed backout order originating 
from system 50C. 



FIG. 53 is a flow graph illustrating the steps of 
the asynchronous, resynchronization-in-progress 
process involving a failed backout order originating 
from system 50A. 

FIG. 53A is a flow graph illustrating the steps of 
asynchronous, resynchronization-in-progress pro- 
cess involving a failed prepare call originating from 
system 50A. 

FIG. 54 is a block diagram of another embodi- 
ment of the invention as an alternate to FIG. 2. 

DETAILED DESCRIPTION OF THE PREFERRED 
EMBODIMENTS 

Referring to the drawings in detail wherein like 
reference numerals indicate like elements through- 
out the several views, Rgure 1 illustrates an LU6.2 
syncpoint tower model or architecture according to 
the Prior Art. This architecture is defined as one 
execution environment. In the illustrated example, 
three application programs 14, 16, and 18 are run 
in execution environment 12 in a time-shared man- 
ner. Resource Managers 26 and 27, DB/2 or CICS 
File Control (DB/2 and CICS are trademarks of IBM 
Corp.), control access to resources 22 and 24, 
respectively. It should be noted that if a DB/2 
(CICS/MVS operating system) or a SQL/DS 
(CICS/VSE operating system) resource manager 
were located outside of environment 12, then envi- 
ronment 12 would include a resource adapter to 
interface to the resource manager according to the 
prior art. In this prior art architecture, application 
program 14 makes a work request invoking re- 
sources 22 and 24 to syncpoint manager 20 before 
requesting committal of resources involved in the 
work request. 

Next, application program 14 requests a com- 
mit from syncpoint manager 20 to commit the data 
updates of the previous work request. In response, 
syncpoint manager 20 implements a two-phase 
commit procedure by polling resource managers 
26 and 27 to determine if they are ready to commit 
the resources and if so, to subsequently order the 
commit. At each phase (and each step of each 
phase ) of the two-phase commit procedure, the 
syncpoint manager transfers syncpoint information 
to log 30 indicating the state of the two-phase 
commit procedure. If a failure occurs during the 
two-phase commit procedure, the syncpoint man- 
ager will implement a synchronization point recov- 
ery procedure to bring the resources to a consis- 
tent state. The syncpoint manager relies on the 
synchronization point information in log 30 to deter- 
mine how far the two-phase commit procedure had 
progressed before interruption. 

Syncpoint manager 20 and the two-phase com- 
mit procedure are also used when any one of the 
applications 14, 16 or 18 attempts to communicate 
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via protected conversation manager 40 using a 
protected conversation to an application partner in 
a separate environment in the same system (not 
shown) or to an application partner within another 
system (not shown) which is interconnected via a 
communication facility. According to the prior art 
synchronization point architecture, this other 
system/other environment is functionally identical 
to the execution environment 12 and includes an- 
other syncpoint manager functionally identical to 
20, another synchronization point log functionally 
identical to 30, another protected conversation 
manager functionally identical to 40 and other re- 
source managers functionally identical to 26 and 
27. This other environment provides coordination 
and recovery functions which are separate from 
those of execution environment 12. 

COORDINATED SYNC POINT MANAGEMENT OF 
PROTECTED RESOURCES 

FIG. 2 illustrates a syncpoint architecture ac- 
cording to the Present Invention. The invention 
includes a distributed computer operating system 
which supports distributed and non-distributed ap- 
plications executing within their own execution en- 
vironment such as a UNIX environment, OS/2 envi- 
ronment, DOS environment in OS/2 operating sys- 
tem, CMS environment in VM operating system, 
AIX environment in VM operating system, CICS in 
VM operating system, and MUSIC environment in 
VM operating system. A distributed application is 
distinguished by using a resource in another ex- 
ecution environment or having a communications 
conversation - a special type of resource - with an 
application partner in another execution environ- 
ment The execution environment for the resource 
manager or application partner may be in the same 
system or a different one; it can be in the same 
type environment or a foreign environment. A dis- 
tributed application execution environment com- 
prises one or more systems supporting applica- 
tions in their own environments that might not have 
all the resources required; those resources are 
distributed elsewhere and are acquired with the aid 
of a communication facility. The complete environ- 
ment of a distributed application appears to be full 
function because the distributed application in- 
volves resources that are in other environments - 
especially the recovery facility and communication 
facility. 

The present invention comprises one or more 
systems (real machines or central electronic com- 
plexes (CECs)) 50 A, D. In the illustrated embodi- 
ment system 50A comprises a plurality of iden- 
tical, distributed application environments 52A, B, 
and C, a conversation manager 53A and execution 
environment control programs 61A.B, and C which 



are part of a system control program 55A. and a 
recovery facility 70 A. By way of example and not 
limitation, each of the environments 52A, B, and C 
can be an enhanced version of a VM virtual ma- 
5 chine, recovery facility 70A can reside in another 
enhanced version of a VM virtual machine and 
system 1 control program 55A can be an enhanced 
version of a VM operating system for virtual ma- 
chines 52A, B, and C. Applications running in dis- 

w tributed application environments 52 A-C in real 
machine 50A can communicate with application 
partners running in similar distributed application 
environments running in real machine 50D or other 
systems (not shown) via communication facilities 

75 57A and D. By way of example, communication 
facility 57A comprises Virtual Telecommunications 
Access Method ("VTAM") facility and APPCA/M 
VTAM Support (AVS) gateway facility. Each distrib- 
uted application environment 52 comprises a single 

20 syncpoint manager (SPM) 60A and a plurality of 
protected resource adapters 62A-B and 64A. A 
syncpoint manager allows a group of related up- 
dates to be committed or backed out in such a way 
that the changes appear to be atomic. The updates 

25 performed between syncpoints (i.e. 
commit/backout) are called a logical unit of work 
and the related updates are identified through a 
unique name assigned by the syncpoint manager 
via the recovery facility called a logical unit of work 

30 identifier. The logical unit of work can involve mul- 
tiple protected resources accessed by an applica- 
tion in the same distributed application environment 
and can also involve protected resources accessed 
by a partner application in other application envi- 

35 ronments via a conversation which is one type of 
protected resource. 

A conversation is a path established in an 
architected manner between two partner applica- 
tions. The use of the conversation by each appiica- 

40 tion is determined by the applications' design and 
the conversation paradigm used. When a conversa- 
tion is to be included in the syncpoint process, it is 
called a protected conversation. Protected re- 
sources become part of the logical unit of work by 

45 contacting the syncpoint manager through a pro- 
cess called registration as described below in Reg- 
istration of Resources for Commit Procedure. 
Each protected resource adapter provides an inter- 
face to a resource manager both for an application 

so and for the syncpoint manager. (Alternatively, the 
protected resource adapter can be merged with the 
resource manager if the resource manager resides 
in the same execution environment as the applica- 
tion.) 

55 In the illustrated embodiment, protected re- 

sources are files and conversations. In other em- 
bodiments of the present invention, protected re- 
sources could be database tables, queues, remote 
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procedure calls, and others. Protected resource 
adapters 62A and B handle interfaces on behalf of 
application 56 A for resource managers 63 A and B, 
respectively, which manage files 78A and B. Re- 
source managers 63A and B are located in the 5 
same system. Alternatively, they could reside in a 
different system in a communication network. In 
the illustrated embodiment, conversations are man- 
aged by a conversation manager which manages 
the conversations or paths from an application to 10 
other partner applications running in different dis- 
tributed application environments in the same sys- 
tem, or different distributed application environ- 
ments in different systems in a communication 
network. If the protected conversation is between 75 
two application partners running in different ap- 
plication environments in the same system, e.g. 
between application partners running in 52A and 
52B, then the conversation manager is totally con- 
tained in the system control program 55A of sys- 20 
tern 50A. and communication is made between the 
application partners via each protected conversa- 
tion adapter 64A and 64B (not shown). If the pro- 
tected conversation is between different application 
environments in different systems, e.g. between 25 
application partners running in 52A and 52D, then 
communication is made between the conversation 
managers 53A and 53D in systems 50A and SOD 
via communication facilities 57A and 57D. In this 
embodiment, such communications utilize a peer to . 30 
peer communication format. Conversation manag- 
ers 53A, D use an intra-environment format to 
communicate with communication facilities 57A, D. 
Communication facilities 57A, D translate the intra- 
environment format to an architected inter-system 35 
communication standard format and vice versa. By 
way of example this architected intersystem com- 
munication standard format can be of a type de- 
fined by IBM's System Network Architecture, LU 
6.2 protocol. 40 

Recovery facility 70A serves all distributed ap- 
plication environments 52A.B, and C within real 
machine 50A. It contains log 72A, its processes 
handle logging for the syncpoint managers 60A.B, 
and C and it provides recovery for failing sync- 45 
points for all distributed application environments 
52 A, B, and C. The same is true for recovery facility 
70D and its log 72D, and syncpoint manager 60D 
on system 50D. 

When application 56A within distributed ap- 50 
plication environment 52A desires to update files 
78A and 78B, application 56A makes two separate 
update requests via a file application program inter- 
face within application 56A. The requests invoke 
protected resource adapters (henceforth called pro- 55 
tected file adapter for this type of resource) 62A 
and 62B respectively for files 78A and 78B (step 
500 of FIG.3). Based on resource manager specific 



implementation, the protected file adapter knows 
the file is protected. If not already registered with 
the syncpoint manager for the work unit, protected 
file adapters 62A and 62B register with syncpoint 
manager 60A the fact that they want to be involved 
in all Commit/Backout requests for this work unit 
(step 502). A "work unit" is a grouping of all 
resources, directly accessible and visible by the 
application, that participate in a sync point. It is 
generally associated with a logical unit of work 
identifier. For a further explanation of work units, 
see Local and Global Commit Scopes Tailored 
to Work Units below. Then protected file adapters 
62A and 62B contact their respective resource 
managers 63A and 63B to update files 78A and 
78B (Step 504). Return is made to application 56A. 
Next application 56A requests a syncpoint 58A, i.e. 
a commit in this case, to syncpoint manager 60A 
(Step 506). In response, syncpoint manager 60A 
initiates a two-phase commit procedure (step 508) 
to be carried out for both of its registered re- 
sources, files 78A and 78B, represented by pro- 
tected file adapters 62A and 62B and their respec- 
tive resource managers 63A and 63B. In step 508, 
syncpoint manager 60A calls each of its registered 
resources at the adapter exit syncpoint exit entry 
point, given to the syncpoint manager by each 
resource adapter during registration, with a phase 
one "prepare" call. 

During the course of executing its two-phase 
commit procedures, syncpoint manager 60A issues 
a request to recovery facility 70A to force log 
("force log" means to make sure the information 
was written to the actual physical device before 
returning to syncpoint manager 60A) on log 72A 
phase one syncpoint manager information (Step 
508). This information includes the logical unit of 
work identifier, the syncpoint manager state and 
the names and other pertinent information about 
each registered protected resource adapter partici- 
pating in the commit request. This information was 
given to syncpoint manager 60A when file adapters 
62A and 62B registered. Syncpoint manager 60A's 
state is determined by the rules of the two-phase 
commit paradigm being followed. For example, the 
two-phase commit paradigm is of a type described 
by System Network Architecture LU 6.2 Refer- 
ence: Peer Protocols, SC31-6808, Chapter 5.3 
Presentation Services - Sync Point verbs pub- 
lished by the IBM Corporation. If a failure occurs 
during the syncpoint processing, the syncpoint 
manager state is used to determine the outcome 
(Commit or Backout) of the logical unit of work. As 
per the rules of the two-phase commit paradigm 
used by this embodiment, the syncpoint manager 
phase one state is Initiator, Syncpoint Manager 
Pending. If the first phase of the two-phase commit 
procedure is not interrupted and is completed 
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(decision block 512), syncpoint manager 60A is- 
sues a second request to recovery facility 70A to 
force log 72A to its phase two state. Based on the 
replies from the protected file adapters and re- 
source managers and the rules of the two-phase 
commit paradigm being used, syncpoint manager 
60A knows its second phase decision. In this em- 
bodiment, the paradigm is as follows. If one or 
more protected resources adapters respond 
"backout" to the phase one request, the phase two 
decision is "backout"; if all respond "request com- 
mit", the decision is "commit". In the example 
illustrated in Figure 3, protected file adapters 62A 
and 62B responded "request commit" (Step 510) 
and the phase two state is logged by syncpoint 
manager 60A as Initiator Committed. It should be 
noted that in this example, file managers 63A and 
63B after replying "request commit" through their 
respective file adapters 62A and 62B to the phase 
one request are in a state of "indoubt", that is they 
can commit or backout the file updates based on 
the phase two decision from syncpoint manager 
60A. 

After logging, syncpoint manager 60A then is- 
sues the phase two call with the decision of com- 
mit to protected file adapters 62A and 62B (Step 

513) . When the file managers 63A and 63B receive 
the phase two commit decision, each proceeds to 
do whatever processing is necessary to commit the 
data, i.e. make the updates permanent (Step 516). 
When a successful reply is received from protected 
file adapters 62A and 62B on behalf of their re- 
spective resource managers and there is no in- 
terruption in syncpoint processing (decision block 

514) , syncpoint manager 60A calls recovery facility 
70A to write to log 72A the state of "forget" for this 
logical unit of work (Step 515). This does not have 
to be a force log write which means the log record 
is written to a data buffer and return can be made 
to syncpoint manager 60A. The buffer can be writ- 
ten to the physical media at a later point in time. 
Based on the two phase commit paradigm used in 
this embodiment, syncpoint manager 60A updates 
the logical unit of work identifier (increments it by 
one) which guarantees uniqueness for the next 
logical unit of work done by application 56A. The 
syncpoint manager then returns to application 56 
(Step 51 5A). 

The two-phase commit paradigms have rules 
for recovery processing, such that recovery facility 
70A knows how to complete an interrupted sync- 
point (Step 517 and FIG. 4). 

If syncpoint manager 60A's process was inter- 
rupted, decision block 514 leads to step 517 in 
which syncpoint manager 60A contacts recovery 
facility 70A. In step 517 recovery facility 70A re- 
ceives the logical unit of work identifier and in- 
formation about the associated failed resource or 



resources from syncpoint manager 60A. Recovery 
facility 70A then finds the correct log entry (Step 
518 of FIG. 4). The log information, in combination 
with the two phase commit paradigm being used, 
5 allows recovery facility 70A's procedures to com- 
plete the interrupted syncpoint processing (Step 
519). Based on the two-phase commit paradigm 
being used in this illustrated example, if the sync- 
point state entry for the logical unit of work iden- 
w tifier on log 72A is Initiator, Syncpoint Manager 
Pending, each failed resource manager 63A or 63B 
will be told to backout; otherwise, each will be told 
the syncpoint manager phase two state which is on 
the log, i.e. commit or backout (Step 520). Once 
15 the recovery state is determined, recovery facility 
70 A will start recovery processes with each failed 
protected resource manager as described below in 
Log Name Exchange For Recovery of Protect- 
ed Resources and in Recovery Facility For 
20 Incomplete Sync Points For Distributed Ap- 
plication. This processing consists of exchanging 
log names and a comparison of states whereby the 
recovery process of recovery facility 70A tells the 
failed resource manager 63A or 63B what to do, i.e. 
25 commit or backout, and the resource manager 63A 
or 63B tells the recovery process what it did. The 
recovery process of recovery facility 70A knows 
how to contact the failed resource based on in- 
formation written by syncpoint manager 60A during 
30 its phase one logging activity. If the failed resource 
manager can be contacted (decision block 521) 
recovery takes place immediately (Step 522). After 
recovery takes place with each failed resource 
(decision block 523) return can be made to sync- 
35 point manager 60A (Step 523A). Syncpoint man- 
ager 60A will then return to the application 56A 
(Step 51 5A). If the failed resource manager could 
not be contacted, decision block 521 leads to de- 
cision block 524 in which recovery facility 70A 
40 checks to see if it must complete the recovery 
processing before returning to application 56A. 
This decision is based on information contained in 
the log record for the logical unit of work written by 
the syncpoint manager during phase one logging. If 
45 it must complete recovery, the recovery process 
keeps trying to contact the failed resource (Step 
525); if it can complete the recovery at a later point 
in time, i.e. wait for recovery was previously se- 
lected, recovery facility 70A returns to syncpoint 
so manager 60A with the intent of the recovery pro- 
cessing (i.e. commit or backout) and an indication 
that the recovery will be completed later (Step 526) 
as described below in Asynchronous Resynch- 
ronizatlon of a Commit Procedure. When all 
55 resources are recovered (Step 525A), syncpoint 
manager 60A returns to application 56A (Step 515) 
with this information. 

Figure 2 also illustrates that application 56A 
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can be part of a distributed application. This means 
there is at least one partner application that can 
work with application 56A to complete its process- 
ing. To establish a distributed application, applica- 
tion 56A initiates a protected conversation which. 5 
starts partner application 56D in system 50D by 
invoking the application program conversation ini- 
tiate interface and indicates the conversation is to 
be protected (FIG. 5a. Step 530). This request is 
handled by protected conversation adapter 64A. 10 
Protected conversation adapter 64A asks syncpoint 
manager 60A for the logical unit of work identifier 
and includes it along with a unique conversation 
identifier in the information sent to the remote 
system 50D. Protected conversation adapter 64A is 
then sends the request to the conversation man- 
ager 53A which sends it to communications facility 
57A. Protected conversation adapter 64A gets an 
indication that the conversation initiate request was 
(or will be) sent from communications facility 57A 20 
to communications facility 57D. At this time pro- 
tected conversation adapter 64A registers with syn- 
cpoint manager 60A (Step 532). Asynchronously to 
this registration process, the conversation initiate 
request is transmitted to communication facility 25 
57D, and then to conversation manager 53D, and 
then to protected conversation adapter 64D (Step 
532 of FIG. 5A). Protected conversation adapter 
64D retrieves the logical unit of work identifier and 
unique conversation identifier and registers with 30 
syncpoint manager 60D on behalf of the conversa- 
tion manager (Step 532). Protected conversation 
adapter 64D at this time also gives syncpoint man- 
ager 60D the logical unit of work identifier it re- 
ceived on the conversation initiate request. Pro- 35 
tected work done by application 56D will be asso- 
ciated with this logical unit of work originally started 
by application 56A (Step 532). The logical unit of 
work identifier will also be assigned to a new work 
unit for application 56D and application 56D is 40 
started. 

Thus, applications 56A and 56D are partner 
applications, and together they are called a distrib- 
uted application. The protected conversation allows 
application 56A and 56D to send and receive data 45 
in a peer to peer manner. This means each side, 
application 56A or application 56D, can originate 
the send or receive which is determined by the 
application writer and the paradigm being used by 
the communication manager. As described above, 50 
a protected conversation is registered with both 
syncpoint managers by protected conversation 
adapters 64A and 64D, respectively. During sync- 
point processing for the application that issued the . 
first commit, a protected conversation adapter re- 55 
presents a resource to the syncpoint manager that 
must respond if it can commit (first phase) and 
whether or not it successfully performed the work 



requested (second phase). To the other protected 
conversation adapter receiving the first phase call 
from its partner protected conversation adapter, the 
protected conversation is a partner syncpoint man- 
ager over which it will receive phase one and 
phase two orders. Its local syncpoint manager acts 
like a resource manager, that is the protected con- 
versation adapter will get the results of what the 
syncpoint manager's resources did (phase one and 
phase two). It should be noted that the syncpoint 
paradigm used provides rules for which application 
partner can issue the first commit. In this example, 
any application partner can issue the commit first 
and this is determined by the distributed applica- 
tion design. 

Application 56A gets control with the indication 
that the request to start was successfully sent by 
communication facility 57A. At this point application 
56A is able to send requests to application 56D 
and application 56A sends a request to application 
56D over the established conversation. In this illus- 
trated example, this request eventually causes ap- 
plication 56D to invoke a file application program 
interface to update file 78D. As described above, 
the update request causes protected file adapter 
62D to register with syncpoint manager 60D under 
the same work unit (previously assigned for ap- 
plication 56D (Step 532) when application 56D was 
started) (Step 533). Also in step 533, application 
56D sends a reply to application 56A over the 
conversation indicating that it completed its work. 
Next, application 56A issues update requests for 
files 78A and 78B. As previously described, pro- 
tected file adapters 62A and 62B had previously 
registered with syncpoint manager 60A and they 
each contact resource managers 63A and 63 B to 
perform the updates (Steps 533 and 533A). 

Application 56A now issues a commit 58A to 
syncpoint manager 60A (Step 534). As described 
above, syncpoint manager 60 A contacts recovery 
facility 70A for its phase one logging and issues a 
phase one "prepare" call to each registered re- 
source (Steps 534A and 535A). Protected file 
adapters 62A and 62B behave as described above. 
When protected conversation adapter 64A receives 
the phase one "prepare" call, it sends an inter- 
system architected "prepare" call over the pro- 
tected conversation it represents, i.e. the one 
originally established by application 56A to applica- 
tion 56D (Step 535). Protected conversation adapt- 
er 64D recognizes this "prepare" call and gives 
application 56D, which had issued a conversation 
message receive call, a return code requesting it to 
issue a commit (Step 536). Application 56D then 
issues a commit 58D to syncpoint manager 60D 
(Step 537). As described above, syncpoint man- 
ager 60D contacts its recovery facility, in this case 
recovery facility 70D to force log 72D with phase 
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one information (Step 538). Because application 
56A issued the original commit request which 
caused application 56D to subsequently issue a 
commit, and based on the two-phase commit para- 
digm used in this embodiment, syncpoint manager 5 
60D's phase one state is "Initiator Cascade, Sync- 
point Manager Pending" (Step 538). Syncpoint 
manager 60D contacts protected file adapter 62D 
with a phase one "prepare" call (Step 538). Pro- 
tected file adapter 62D and its associated resource 10 
manager 63D perform phase one processing as 
previously described and returns a reply of 
"request commit". 

In this example, there were no interruptions 
and decision block 539 leads to step 540 in which is 
syncpoint manager 60D contacts recovery facility 
70D to force log 72D to a state of "Agent, Indoubt". 
This state means that if an interruption subse- 
quently occurs such that syncpoint manager 60D 
does not receive the phase two decision from syn- 20 
cpoint manager 60A, it would have to wait for 
recovery processing from recovery facility 70A to 
complete its syncpoint processing. Syncpoint man- 
ager 60D then contacts protected conversation 
adapter 64D with a reply of "request commit". 25 
Protected conversation adapter 64D then sends an 
intersystem architected "request commit" reply to 
protected conversation adapter 64A (step 541) 
which In turn replies "request commit" to syncpoint 
manager 60A (Step 542). As described above, syn- 30 
cpoint manager 60A received "request commit" 
from protected file adapters 62A and 62B (Step 
535A). Since there are no interruptions in the illus- 
trated example, decision block 543 leads to step 
544 in which syncpoint manager 60A contacts the 35 
recovery facility 70A to force log 72A to a phase 
two state of "Initiator, committed" (Step 544). Syn- 
cpoint manager 60A then calls each registered 
protected resource adapter with the phase two de- 
cision of "Committed" (FIG. 5b, Step 545). Pro- 40 
tected file adapters 62A and 62B process the com- 
mit decision as described above (Step 545A). 
When protected conversation adapter 64A receives 
the commit decision, it sends an intersystem ar- 
chitected "committed" call over the protected con- 45 
versation it represents, i.e. the one originally estab- 
lished by application 56A to application 56D (Step 

546) . Protected conversation adapter 64D receives 
the "commit" call and replies to syncpoint manager 

60D the phase two decision of "commit" (Step 50 

547) . 

As described above syncpoint manager 60D 
contacts recovery facility 70D to force log 72D to 
the phase two state. Because application 56A is- 
sued the original commit request which caused 55 
application 56D to subsequently issue a commit, 
and based on the two-phase commit paradigm 
used in this embodiment, syncpoint manager 60D's 
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phase two state is "Initiator Cascade, Committed" 
(Step 548). Syncpoint manager 60D contacts pro- 
tected file adapter 62D with the phase two commit 
decision. (Step 549). Protected file adapter 62D 
and its associated resource manager 63D perform 
commit processing as previously described and 
returns a reply of "forget". Since there were no 
interruptions (decision block 550), syncpoint man- 
ager 60D contacts resource facility 70D to log in 
log 72D a state of "Forget" for the syncpoint log 
record for this logical unit of work (Step 551). 
"Forget" means that syncpoint processing is com- 
plete and the log record can be erased. Syncpoint 
manager 60D then contacts protected conversation 
adapter 64D with a reply of "forget". Based on the 
two-phase commit paradigm used in this embodi- 
ment, syncpoint manager 60D increments the logi- 
cal unit of work identifier by one and returns to 
application 56D with an indication that the commit 
completed successfully. (Step 552). Updating the 
logical unit of work identifier guarantees unique- 
ness for the next logical unit of work done by the 
distributed application. 

Next, protected conversation adapter 64D 
sends an intersystem architected "forget" reply to 
protected conversation adapter 64A which in turn 
replies "forget" to syncpoint manager 60A (Step 
553). As described above syncpoint manager 60A 
also receives "forget" replies from protected file 
adapters 62A and 62B (Step 545A). Assuming 
there are no interruptions, decision block 554 leads 
to step 555 in which syncpoint manager 60A con- 
tacts recovery facility 70 A to log in log 72A a state 
of "forget" for this logical unit of work. Again based 
on the paradigm of the two-phase commit process 
being used, syncpoint manager 60A then incre- 
ments the logical unit of work identifier by one 
(Step 556). This change guarantees a new unique 
logical unit of work identifier for the distributed 
application. Syncpoint manager 60A then notifies 
application 56A that the Commit request completed 
successfully. If during the two-phase commit pro- 
cedure, the syncpoint processing was interrupted 
in either syncpoint manager 60A, syncpoint man- 
ager 60D or both, recovery facility 70A and recov- 
ery facility 70D would implement a recovery opera- 
tion which is represented in the logical flow by 
steps 557,558 and 559,560 and is more fully de- 
scribed below in Log Name Exchange For Re- 
covery of Protected Resources, Recovery Fa- 
cility For Incomplete Sync Points For Distrib- 
uted Application, and Asynchronous Resynch- 
ronization of a Commit Procedure. 

FIGURE 54 is an alternate embodiment to that 
illustrated in FIGURE 2 and can best be described 
by comparison to FIGURE 2. In both FIGURE 2 
and FIGURE 54, application environments, system 
facilities, and resource managers are distributed. 
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However, in FIGURE 2 one physical device, sys- 
tem 50A, contains multiple application environ- 
ments, 52A,B,C, two resource managers 63 A, B, 
recovery facility 70A and communication facility 
57A. FIGURE 2 shows that System Control Pro- 5 
gram 55A contains the conversation manager 53A 
and the Syncpoint Manager 60A,B,C. System 50A 
of FIGURE 2 can be a mainframe computer and 
configurations of this type are often called central- 
ized computing. Also, FIGURE 2 shows application 10 
environments in system 50A connected to applica- 
tion environments in system 50D through a com- 
munication network. In contrast, FIGURE 54 shows 
each application environment, system facility and 
resource manager in a separate physical machine. 75 
This configuration is called distributed computing. 
In this environment systems 90A,B,C, 110E, 114F, 
and 120G are programmable workstations similar in 
function but not necessarily similar in size and 
power to systems 50A.D of FIGURE 2. The sys- 20 
terns of FIGURE 54 are connected by a commu- 
nication network which, for example, is a local area 
network (LAN). Application environments 92A.B, 
and C of FIGURE 54 are functionally equivalent to 
application environments 52A.B, and C of FIGURE 25 
2. However, each application environment 92A, B, 
and C is contained in a separate programmable 
workstation. Each system control program 95A,B, 
and C of FIGURE 54 is functionally equivalent to. 
system control program 55A of FIGURE 2. Each 30 
system control program 95A, B, and C contains (a) 
a Syncpoint Manager 100A, B, or C which is func- 
tionally equivalent to Syncpoint Managers 60A.B, 
and C, (b) execution environment control programs 
91 A, B, and C which are functionally equivalent to 35 
execution environment control programs 61 A, B, 
and C, (c) protected conversation adapters (PCA) 
104A, B, and C which are functionally equivalent to 
PCA 64A, B, and C, (d) resource adapters (RA). 
102A,B,C and 103 A.B.C which are functionally 40 
equivalent to resource adapters 62A.B, and (e) 
conversation managers 93A,B,C which are function- 
ally equivalent to conversation managers 53A,B,C 
and communication facilities 97A.B.C each of which 
is functionally equivalent to communication facility 45 
57A. However, in the example of FIGURE 54, the 
communication facility is part of each system con- 
trol program 95A.B, and C and not in its own 
execution environment. Also in FIGURE 54, re- 
source managers 112E and 113F and their respec- 50 
tive files/logs 115E.116E and 117F.118F are func- 
tionally equivalent to resource managers 63A and 
63B and their respective files/logs 78A, 800A and 
78B, 800B of FIGURE 2. However, resource man- 
agers 112E and 113F are each on separate prog- 55 
rammable workstations. Recovery facility 121 G 
and its log 122G in FIGURE 54 are functionally 
equivalent to recovery facility 70A and its log 72A 



in FIGURE 2. However, recovery facility 121G is in 
a programmable workstation. System SOD of FIG- 
URE 54 is the same as system 50D of FIGURE 2 
and is included to show the versatility of the net- 
work. A description of syncpoint processing in this 
environment can be obtained by substituting the 
correct numbers from FIGURE 54 for the corre- 
sponding numbers from FIGURE 2 as just de- 
scribed into the syncpoint processing description 
above. Thus, there are a wide range of computer 
systems and networks in which the present inven- 
tion can reside. 

It is possible in system 50A, FIG. 2, for recov- 
ery facility 70A to become unavailable for a variety 
of reasons. Accordingly, system 50A provides 
back-ups. For example, if recovery facility 70A is 
part of an execution environment which also con- 
trols a resource manager and the resource man- 
ager encounters a disabling failure, then recovery 
facility 70A will also become inoperational. In the 
example illustrated in FIG. 28, system 50A includes 
more than one execution environment dedicated to 
a resource manager, and each execution environ- 
ment containing the resource manager also con- 
tains a recovery facility program, although only one 
recovery facility in a system may be active at one 
time. 

Specifically, FIG 28 illustrates that in system 
50A there are three identical execution environ- 
ments 52E, 52F and 52G each containing a re- 
source manager (program) 260A, 260B and 260C, 
respectively. Preferably, each resource manager 
260A, 260B and 260C is an enhanced version of 
the Shared File System (SFS) resource man- 
ager of the VM/SP Release 6 operating system 
('VM* is a trademark of the IBM Corp. of Ar- 
monk, N.Y.) and associated resources 262A, 262B 
and 262C, respectively. In addition, each execution 
environment 52E, 52F and 52G also contains a 
program 70A, B and C to provide the function of 
recovery facility 70A illustrated in FIG 23. An ad- 
vantage of locating each recovery facility in an 
execution environment which includes the shared 
file system is that the shared file system includes 
services, i.e. communication and tasking services, 
that the recovery facility can use. The communica- 
tion services handle communication protocols, in- 
terrupt processing, and message management. In 
system 50A FIG 28, recovery facility 70A is initially 
identified to the system control program as the 
recovery facility associated with recovery facility 
log 72A when the execution environment 52E is 
initialized. This is accomplished by specification of 
a parameter as input to the execution environment 
52E's initialization process. Execution environment 
52E identifies itself to the system control program 
as the recovery facility and as the target of all 
communication in system 50A for the 
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sync point log resource identifier. (Refer to sec- 
tion 'Log Name Exchange for Recovery of Pro- 
tected Resources' for description of term 

sync point__log resource identifier.) This 

sync point_log resource identifier must be 5 

unique in system 50 A and can be associated with 
only one execution environment at any time. In the 
illustrated embodiment execution environment 52E 
defines a nonvolatile storage area which contains 
recovery facility log 72A so that specification of 10 
execution environment 52E automatically implies 
log 72A as the resource recovery log, absent an 
overruling specification of another storage area. 

However, if execution environment 52E is not 
available, the user can activate recovery facility 75 
70B or 70C as a backup and move log 72A to 
execution environment 52F or 52G by specifying 
the aforesaid parameter at initialization of execution 
environment 52F or 52G and specifying to the 
execution environment the location of recovery fa- 20 
cility log 72A. The user specifies the location of log 
72A by giving the system control program the 
necessary commands from the chosen execution 
environment 52F or 52G to identify the location of 
the nonvolatile storage area that contains recovery 25 
facility log 72A. 

All the information that is needed by the recov- 
ery facility to complete ^synchronization after a 
syncpoint failure is contained in recovery facility 
log 72A, and no information required for the sync- 30 
point recovery is contained in the execution envi- 
ronment, resource manager, or associated non- 
volatile storage. Therefore, any execution environ- 
ment with the resource manager that contains the 
recovery facility program can act as the recovery 35 
facility 70A as long as the active recovery facility 
has access to log 72A. The back-up transfer of the 
recovery facility function to execution environment 
52F is indicated by communication path 272B, and 
the back-up transfer of the recovery facility function 40 
to execution environment 52G is indicated by com- 
munication path 272C. 

Communication between any of the syncpoint 
managers 60A, 60B, or 60C in any application 
environment with the recovery facility 70 is accom- 45 
plished by using the sync_point_Jog resource 
identifier when initiating a conversation through the 
system control program to the recovery facility. 

LOCAL AND GLOBAL COMMIT SCOPES TAI- 50 
LORED TO WORK UNITS 

The foregoing flowcharts of Figures 5 A,B illus- 
trate an example where a single logical unit of work 
or commit scope extends to two application part- 55 
ners in different systems, for example, to resources 
and applications in more than one execution envi- 
ronment in different systems, and the commit pro- 



cedure is coordinated between the two application 
partners. The following describes in detail this pro- 
cess as well as the ability of System 50A to pro- 
vide separate work units or commit scopes for the 
same application in the same execution environ- 
ment. Thus, all systems 50 can tailor commit 
scopes to the precise resources which are involved 
in one or more related work units. 

As noted above, a "work unit" is the scope of 
resources that are directly accessible by one ap- 
plication and participate in a common syncpoint. 
For example (in Figure 2), the resources coupled to 
resource adapters 62A and 62B and protected con- 
versation adapter 64A are all directly accessible by 
application 56A and therefore, could all have the 
same work unit. They would all have the same 
work unit if they all were involved in related work 
requests made by application 56A. The work unit 
identifiers are selected by the system control pro- 
gram 55 and are unique within each execution 
environment. In the illustrated embodiment, the 
system control program 55A comprises a conver- 
sation manager 53A, and an execution environment 
control program 61 for each execution environment 
52. By way of example and not limitation, execution 
environment control program 61 A can be an en- 
hanced CMS component of the VM/SP Release 6 
operating system ("VM" is a trademark of IBM 
Corp. of Armonk, NY). This execution environment 
control program controls the execution of applica- 
tion 56A and, as noted above, assigns the work unit 
identifications. Thus, the work unit identifications 
are unique within each execution environment. The 
application uses the same work unit for multiple, 
related work requests and different work units for 
unrelated work requests. A "logical unit of work" 
identifier is a globally unique (network wide) iden- 
tifier for all resources that are involved in related 
work requests and encompasses all the related 
work requests. The logical unit of work identifiers 
are assigned by the recovery facility 70 of the 
system in which the work request originated and in 
this embodiment comprises: 

(1) A network identifier which identifies a group 
of interconnected systems; 

(2) A system identifier which identifies one com- 
munication facility within the network; 

(3) An instance number that provides a locally 
unique element to the LUWID (for example, a 
timestamp may be used); and 

(4) A sequence number which identifies a par- 
ticular syncpoint instance. 

By way of example, this is of the type defined 
by System Network Architecture LU 6.2 Refer- 
ence: Peer Protocols, SC31-6808 Chapter 5.3 
Presentation Services - Sync Point verbs. The 
syncpoint manager 60 requests the logical unit of 
work identifier (LUWID) from the recovery facility 
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when a protected conversation is involved in the 
work unit or when a two-phase commit procedure 
will be required, even if the work request does not 
require a protected conversation. The LUWID may 
be requested by the resource adapter by calling 5 
the syncpoint manager, or by the syncpoint man- 
ager by requesting an LUWID at the beginning of 
commit processing if one has not been acquired 
yet and it is needed for the commit. As described 
in more detail below, a work unit is associated with 10 
a LUWID when protected resources such as a 
protected conversation or multiple protected re- 
sources are involved In the work unit. A work unit 
can include a mixture of multiple files and multiple 
file repositories, other protected resources and oth- 15 
er participating resource managers, and protected 
conversations between different parts of a distrib- 
uted application. In the case of a protected con- 
versation, a single logical unit of work extends 
between two or more application partners, even 20 
though each application partner assigns a different 
work unit (within each execution environment) to 
the same protected conversation and to other re- 
sources directly accessed by this application. 
Thus, each application partner associated with a 25 
protected conversation assigns and uses its own 
work unit locally, but the work units of the two or 
more application partners refer to the same distrib- 
uted logical unit of work. It should be noted that 
each execution environment is ignorant of the work 30 
unit identifications assigned by the other execution 
environment, and it is possible by coincidence only 
that work units in different execution environments 
have the same identifier. Work units with the ex- 
tended scope described above, rather than 35 
LUWIDs, are used to define local commit scopes 
because existing applications can benefit from the 
extended function with a minimum of change. 
Changing from work units to LUWIDs would be 
cumbersome and would require existing applica- 40 
tions to change. 

Figures 6-9 illustrate, by example, a process 
for establishing different work units and logical 
units of work for the same application 56A, and 
another logical unit of work which extends to mul- 45 
tiple resources associated with a plurality of ap- 
plication partners 56A and 56D running in different 
systems 50A and 50D, respectively. In the illus- 
trated example in Figure 7, application 56A is ini- 
tiated and obtains a work unit identifier X from 50 
execution environment control program 61 A (Step 
928). The execution environment control program 
is responsible for selecting a unique work unit 
identifier within each execution environment. Then, 
application 56A makes a work request to resource 55 
adapter 62A within execution environment 52A to 
update a file located in resource 78A specifying 
that the work request is to be made under work 
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unit X, or by default, the work request is assigned 
to be under a "current work unit" designated by 
application 56A (Step 930). If the resource adapter 
requests the LUWID for work unit X (Decision 
Block 935). then syncpoint manager 60A requests 
a LUWID from recovery facility 70A to encompass 
work unit X if one is not already assigned and 
associates it with work unit X. Then the syncpoint 
manager returns the LUWID to the resource adapt- 
er (Step 936). In the illustrated example in Figure 6, 
resource 78A (accessed via resource adapter 62 A) 
is not a protected conversation so Decision Block 
937 (Figure 7) leads to Step 939 in which the 
resources are updated. If resource adapter 62A 
was not previously registered for work unit X 
(Decision Block 933), then resource adapter 62A 
registers with syncpoint manager 60A (Step 934). 
In the foregoing example, application 56A does not 
desire to perform additional work under the same 
work unit (Decision Block 940), and does not desire 
to do new unrelated work (Decision Block 941), so 
the next step is for application 56A to issue a 
commit (Step 942). In response, syncpoint man- 
ager 60A initiates the one-phase commit procedure 
(Step 944). However, it should be noted that ap- 
plication 56A is not required to issue the commit 
for work unit X before beginning some other un- 
related work request (Decision Block 941). In this 
particular case, the syncpoint manager is perform- 
ing a one-phase commit procedure and so, does 
not need a LUWID. 

In the illustrated example, application 56A next 
begins the following process to do work indepen- 
dently of work unit X. Application 56A requests a 
new work unit from execution environment control 
program 61 A, and execution environment control 
program 61 A returns work unit Y (Step 928). Next, 
application 56A makes a request to update re- 
source 78B via resource adapter 62B under work 
unit Y (Step 930). If the resource adapter requests 
the LUWID for work unit Y (Decision Block 935), 
syncpoint manager 60A obtains from recovery fa- 
cility 70A a LUWID and associates it with work unit 
Y (Step 936). At this time, the logical unit of work 
for work unit Y extends only to resource manager 
63B. Next, an update to resource 78B is imple- 
mented (Step 939). Since resource adapter 62B 
has not yet registered for work unit Y, it registers 
with syncpoint manager 60A (Step 934). 

Next, application 56A desires to do additional 
work under the same work unit Y (Decision Block 
940) e.g. to make changes to data in other re- 
sources. In the example illustrated in Figure 6. the 
other resource is a protected conversation, and the 
protected conversation is used to access resources 
in system SOD via distributed application partner 
56D. In the illustrated example, this is the begin- 
ning of a new protected conversation. Thus, ap- 
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plication 56A initiates a new protected conversation 
with application 56D under work unit Y (Step 930). 

Because protected conversation adapter 64A 
requests the LUWID for work unit Y, the syncpoint 
manager invokes the recovery facility if a LUWID 
has not yet been assigned and associated with the 
work unit, and returns the LUWID to the protected 
conversation adapter (Step 936). (The protected 
conversation adapter will need the LUWID when 
the conversation is initiated (Step 947).) Decision 
Block 937 leads to Decision Block 946. Because 
this is a new protected conversation, conversation 
manager 53 A initiates a protected conversation and 
sends the LUWID associated with work unit Y to a 
communication facility (Step 947). In the illustrated 
example, where application partner 56D resides in 
a different system, communication facility 57A is 
utilized. However, it should be noted that if the 
application partner resided in another execution 
environment, for example 52B, within the same 
system 50A, then the communication function is 
provided by conversation manager 53A of system 
control program 55A, without involvement of com- 
munication facility 57A. When protected conversa- 
tion adapter 64A receives control back from con- 
versation manager 53A and the protected con- 
versation initiation request was indicated as suc- 
cessful, protected conversation adapter 64A regis- 
ters with syncpoint manager 60A (Step 948) and 
gives control back to application 56A. At this time 
application 56A sends a message to application 
56D requesting the update of resource 78D (Step 
949). However, the message is buffered in system 
50D until application 56D is initiated. After the 
message is sent, application 56A has no more work 
to do (Decision Blocks 940 and 941) and issues a 
commit on work unit Y (Step 942). Syncpoint man- 
ager 60A initiates a two-phase commit procedure 
(Step 944). 

When system control program 55 D receives 
the conversation initiation request from commu- 
nication facility 57A via communication facility 57D 
(Step 960 in Figure 8), system control program 
55D initiates execution environment 52D (Step 
962). Protected conversation adapter 64D obtains 
new work unit Z for execution environment 52D in 
which application 56D will run from execution envi- 
ronment control program 61 D. This work unit is 
unique within execution environment 52D. Also, 
protected conversation adapter 64D tells the sync- 
point manager to associate the LUWID received 
with the initiated conversation to the new work unit, 
and then registers with syncpoint manager 60D 
under the new work unit (Step 966). (The flow of 
the conversation initiation request in Step 947 is 
from protected conversation adapter 64A to con- 
versation manager 53A, to communication facility 
57A, to communication facility 57D, to conversation 
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manager 53D, and to protected conversation adapt- 
er 64D.) Application 56D is then started. 

Next, application 56D makes a work request in 
Step 930D, and in the illustrated example, the first 
s work request is to receive a message on the con- 
versation. Because the protected conversation al- 
ready has the LUWID, Decision Block 935D leads 
to Decision Block 937D. Because this is a pro- 
tected conversation but not a new outbound pro- 
10 tected conversation (i.e., not an initiation of a new 
protected conversation), Decision Blocks 937D and 
946D lead to Step 949D in which the message is 
received by application 56D. 

In the illustrated example from Figure 6, the 
75 protected conversation causes application 56D to 
perform additional work e.g. update a file within 
resource 78D (via resource adapter 62D) and there- 
fore Decision Block 940D leads to Step 930D in 
which application 56D makes a work request to 
20 update resource 78D using work unit Z. If the 
resource adapter requests the LUWID (Decision 
Block 935D), the syncpoint manager returns the 
LUWID to the resource adapter (Step 936D). It was 
not necessary to invoke the recovery facility to 
25 assign the LUWID since it was already assigned 
and associated with the work unit in Step 966. 
Because this work request does not involve a pro- 
tected conversation resource, Decision Block 937D 
leads to Step 939D in which resource 78D is up- 
30 dated according to the work request. Because re- 
source adapter 62D was not previously registered, 
Decision Block 933D leads to step 934D in which 
resource adapter 62D is registered with syncpoint 
manager 60D. Application 56D now needs to deter- 
35 mine when application 56A requests the commit of 
the work. This is accomplished by application 56D 
by doing a receive (work request) on the protected 
conversation. Application 56D will get a return code 

of Take Syncpoint when application 56A has is- 

40 sued the commit. Therefore, Decision Block 940D 
leads to Step 930D in which application 56D issues 
a receive on the protected conversation under work 
unit Z. Since protected resource adapter 64D does 
not need the LUWID, (Decision Block 935D) and 
45 the work request involves a protected conversation 
(Decision Block 937D) and the protected conversa- 
tion is not a new outbound conversation (Decision 
Block 946D), the receive is done (Step 949D). 
Since application 56D has no additional work to do 
50 on work unit Z, Decision Block 940D will lead to 
Decision Block 941 D. When application 56A has 
issued the commit (Decision Block 941 D), applica- 
tion 56D will get a Take_Syncpoint return code on 
the receive, and issue a commit (Step 942D). Next, 
55 Syncpoint Manager 60D will initiate the commit 
procedure (Step 944D). In the illustrated example, 
this concludes the work request associated with 
work unit Z, and Decision Block 950D leads to the 
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end of application 56D. At this time, application 
56A receives control back from syncpoint manager 
60A and ends. 

Figure 9 (and Figures 3-5 above) illustrate the 
timing of the commits in execution environments 5 
52A and 52D according to the example used in this 
invention. When the protected conversation is in a 
send state relative to execution environment 52A, 
application 56A issues a commit for work unit Y, as 
previously described in Step 942 (Figure 7). When w 
execution environment 52D is in receive state for 
the protected conversation, it receives a message 

along with a return code of Take Syncpoint from 

execution environment 52A. It should be noted that 
after receipt of the Take_Syncpoint return code, 75 
application 56D should issue a commit as soon as 
possible because this return code indicates that 
application 56A has issued the commit and is wait- 
ing for execution environment 52D to issue the 
corresponding commit. Thus, after receipt of the 20 
message on the protected conversation and the 
return code, application 56D completes work on 
other protected resources associated with the work 
unit in System 50D to get those other resources 
into a consistent state. After this is done, such that 25 
all resources in System 50D associated with the 
work unit 2 are consistent, application 56D issues 
the commit. Next, syncpoint manager 60A and 60D 
implement respective two-phase commit proce- 
dures for resources directly accessed by the re- 30 
spective applications 56A and 56D. Even though 
separate commits are invoked to commit those 
resources which are directly accessed by the re- 
spective applications, during the two-phase commit 
processing each syncpoint manager reports sync- 35 
point status information to the other syncpoint man- 
ager. For a more detailed description of syncpoint 
processing, see Coordinated Sync Point Man- 
agement of Protected Resources. 

40 

REGISTRATION OF RESOURCES FOR COMMIT 
PROCEDURE 

FIG. 10 schematically illustrates automatic and 
generic registration of resources, where registration 45 
is a facility that identifies protected resources to 
synchronization point manager (SPM) 60. In each 
application execution environment 52, the resource 
adapter 62/64 and the SPM 60 participate in reg- 
istration on behalf of the application 56. In the 50 
illustrated embodiment, the resource manager 63 
and the resource 78 are located outside of this 
environment 

In FIG. 10. the application 56 is shown as 
having two parts, a work request and a commit 55 
request. Both parts usually execute in the same 
application execution environment. However, a 
broken line between the two parts is shown in the 



109 A2 28 

figure to indicate that the application may be dis- 
tributed and that the two request types may origi- 
nate from different environments. 

Assume that an end user starts application 56 
by invoking the start facility of the system control 
program. The start facility builds the application 
execution environment 52, and loads and transfers 
control to the application 56. When the application 
56 starts to execute, there are no resources 78 yet 
registered with SPM 60. 

When the application 56 in FIG. 2 makes a 
work request (steps 500/530 in FIGS. 3/5 (A)) to 
use a resource 78, this request invokes a specific 
adapter 62/64 associated with the resource 78. The 
general function of the adapter 62/64 is to connect 
the application 56 to the resource manager 63. In 
system 50 the resource adapter 62/64 is extended 
to include a registration sub-routine that automati- 
cally registers in the sync point manager 60. and 
an adapter sync point exit entry point that supports 
the two-phase commit procedure. 

The work request entry point indicates code 
lines in the adapter 62/64 that pass the work re- 
quest (ex. to open a file, insert records into a data 
base, initiate a conversation, etc.) from the applica- 
tion 56 to the resource manager 63. These code 
lines also interact with the registration sub-routine 
in the adapter 62/64 to do automatic registration. 
Registration informs SPM 60 that the resource 78 
is part of a work unit. Also, registration identifies 
the resource manager 63 to SPM 60. This consists 
specifically of telling SPM 60 the adapter sync 
point exit entry point, and the resource manager's 
object_recovery resource identifier. 

The adapter sync point exit entry point in- 
dicates code lines within the resource adapter 
62/64 to be used by the SPM 60's two-phase 
commit facility when a commit request is made 
(Steps 506/534 in figs. 3/5A). The object_recovery 
resource identifier is the identifier used by the 
recovery facility 70, described in the below section 
entitled "Log Name Exchange for Protected Re- 
sources" (Step 225 of FIG. 26), to initiate a con- 
versation with the resource manager 63 in the 
event of a failure during the SPM 60 two-phase 
commit process. 

The process initiated by a work request to any 
resource adapter 62/64 to handle automatic reg- 
istration for the application 56 is resource depen- 
dent. The resource 78 to be used can be inherently 
protected regardless of the nature of the work 
request, and if it has not yet registered, the adapter 
62/64 uses its registration sub-routine to automati- 
cally register the resource with SPM 60 for the 
application 56. Alternately the adapter 62/64 may 
not know if the resource 78 is protected. The 
resource manager 63 may have this knowledge. In 
this case, the adapter 62/64 may register and pass 
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the work request to the resource manager 63. The 
resource manager 63 may do the work request and 
return to the adapter 62/64 with an indicator wheth- 
er the resource 78 requires or does not require 
protection. If protection is not required, the adapter s 
62/64 may use its registration sub-routine to un- 
register with SPM 60. Or the adapter 62/64 may 
determine inherently from the work request or from 
the resource manager 63 that the resource will not 
be changed by the application 56; that is, the w 
resource is used only for read. For this case, the 
adapter 62/64 may use the registration facility of 
SPM 60 to change the registration to read-only. 
Finally, the adapter 62/64 may determine that the 
resource 78 is a read-only resource or an unprotec- 15 
ted resource that should be made available to other 
applications as soon as possible. In this case, the 
adapter may remain registered in order to obtain 
the prepare order during a two-phase commit pro- 
cedure. The resource adapter 62/64 can then use 20 
the order as a cue to unlock the resource 78. In 
this case the adapter 62/64 may respond 
"prepared" and "commit" to the orders from SPM 
60. 

By supporting unregistration and change of 25 
registration, as described in more detail below, the 
adapter 62/64 can give information to SPM 60 that 
allows for optimizing the two-phase commit proce- 
dure (also, as described below). When the applica- 
tion 56 issues a commit request, the SPM 60 may 30 
realize that only one resource is registered as 
having been changed (either no other resource is 
registered, or all other resources are registered as 
read-only). For this case the SPM 60 may use the 
more efficient one-phase commit process. 35 

Now consider the foregoing general control 
flow as applied to a specific example where ap- 
plication 56A of FIG. 2 is executing and makes a 
work request for a protected conversation with a 
partner application 56D (Step 530 of FIG. 5A). The 40 
request is processed by protected conversation 
adapter 64 A which is one type of resource adapt- 
er. This adapter uses its registration sub-routine to 
invoke the registration facility of SPM 60A (Step 
532). Next the adapter 64A uses communication 45 
facility 57A, which acts as a resource manager, to 
initialize the partner application 56D. As illustrated 
in FIG. 2, the conversation manager 53A is capable 
of starting a partner application on the same sys- 
tem 50A, or of communicating with a counterpart 50 
communication facility 57D on another system SOD 
via communication facility 57A to start an applica- 
tion within system SOD. In the latter case, the 
partner application runs on system SOD and the 
communication facility 57D starts the partner ap- 55 
plication 56D by invoking the system control pro- 
gram 55D's start facility. This facility builds the new 
application execution environment 52D for the part- 



ner application 56D. Since the start facility knows 
that it is building a partner application 56D, it 
knows that the communications facility 57D will be 
used in the protected conversation with the origi- 
nating application 56A. Thus, the start facility tem- 
porarily acts as the partner application 56D and 
invokes the resource adapter 64D for protected 
conversations. Then, adapter 64D registers the pro- 
tected conversation with the SPM 60D. Thus, the 
partner application 56D's protected conversation 
with the originating application 56A is registered 
prior to the invocation of the partner (alternatively, 
the registration could be delayed until the partner 
application 56D uses the conversation with the ap- 
plication 56A). Thus, in FIG. 2, the SPM 60A within 
execution environment 52A of the application 56A 
and the SPM 60D within the execution environment 
52D of the partner application 56D are each in- 
formed of the protected conversation resource. 

At this point in the discussion in FIG. 2, the 
application 56A and the partner application 56 D are 
each executing in their own execution environ- 
ments 52A and 52D under respective work units, 
and each may use one or more protected re- 
sources 78A or 78D. Each may, for example, use 
protected files. When the application 56A makes a 
request to use a file resource 78A, the file resource 
adapter 62A is invoked. The adapter uses its reg- 
istration sub-routine to invoke the SPM 60A reg- 
istration facility. Then the adapter invokes the file 
resource manager 63A. Thus, again, application 
56A's usage of a protected resource 78A is auto- 
matically registered. Analogous registrations can be 
made in execution environment 52D for one or 
more resources such as resource 78D. 

From the above examples we see that this 
embodiment of registration is generic because reg- 
istration does not depend on resource type. In FIG. 
10, any resource manager 63, that wants to support 
a protected resources 78 may add the registration 
subroutine to its resource adapter 62/64. No 
changes would be required to the system 50 sync 
point support. 

In FIG. 10, the application 56 may also use 
non-protected resources. For example, the applica- 
tion may want to create a non-protected partner 
application that periodically displays messages 
about the work being done, where the display need 
not be synchronized with the actual completion of 
work. For this case, the application 56 makes a 
work request to have a non-protected conversation. 
The control flow is much the same as for a pro- 
tected conversation in the above example. The 
only difference is that the resource adapter 64 
knows from information in the work request that the 
conversation is not protected and in the illustrated 
embodiment, does not register with the SPM 60. 
Thus, the non-protected conversation will not par- 
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ticipate in the synchronization point processing of 
SPM 60. 

In FIG. 10, given the registration process de- 
scribed above, whenever the application 56 issues 
a commit request, the SPM 60 has a complete list 
of protected resources that need to be synchro- 
nized. See the foregoing section entitled 
"Coordinated Sync Point Management of Protected 
Resources where the two-phase commit proce- 
dure in SPM 60 is described. This shows how SPM 
60 uses the adapter sync point exit entry points in 
the resource adapter 62/64 to use the sync point 
support in the resource managers 63. Although not 
shown in FIG. 10, the application 56 may issue a 
back out request. For this case, the SPM 60 gives 
a back out order to the adapter sync point exit 
entry point in the resource adapter 62/64. 

At the end of the synchronization point pro- 
cess, each SPM 60 does not destroy the applicar 
tion 56's registration list. It does, however, invoke 
the resource adapter's exit one more time for post 
synchronization processing. For this invocation, the 
adapter may decide to modify its registration. For 
performance reasons, the adapter may keep the 
resource registered until the application 56 ends. 
On the other hand, if the adapter knows that the 
resource 78 will no longer be used (for example, a 
protected conversation may end before the applica- 
tion 56 ends), the adapter may use its registration 
entry point 62 to unregister with SPM 60. 

The control flows above assumed distributed 
resource managers 63. Thus, any request to use a 
resource 78 always went to the appropriate re- 
source adapter 62/64 which, in turn, invoked the 
registration facility in SPM 60 and the work request 
in the distributed resource manager 63. However, 
for the case where the resource manager 63 is not 
distributed, the adapter need not get involved with 
a work request. For this case, since resource man- 
ager 63 and SPM 60 are in the same application 
execution environment 52, the resource manager 
63 may directly invoke the registration facility in 
SPM 60. 

In the illustrated example of FIGURE 12, ap- 
plication 56A makes multiple work requests. They 
are processed by system 50A concurrently and 
involve more than one resource manager and re- 
source. Specifically for the example, application 
56A makes eight work request for two work units, C 
and D, that are processed concurrently by system 
50A. The commit points, shown in FIGURE 13, are 
at times 19 and 44 for work unit C and at time 33 
for work unit D. The time units in FIGURE 13 are 
logical clock units denoting sequence (not physical 
clock units), in the illustration of Figure 13, events 
occurring at the same time implies that their order 
is not important. 

A work unit is an application's understanding, 
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or scope, of which resources participate in a syn- 
chronization point. An application can specify for 
which work unit changes to protected resources are 
made. An application can also specify under what 
5 work unit protected conversations are initiated. 
System 50A permits multiple work units in the 
application execution environment (52A in FIGURE 
12). Specifically, applications, sync point manager 
60A, and protected adapters (e.g., SQL Resource 
io Adapter in FIGURE 12) can support multiple con- 
current work units. System 50A also permits tying 
together the work units of two application execution 
environments via a protected conversation. Each 
work unit can have a series of synchronization 
75 points. A synchronization point request to a work 
unit does not affect activity on other work units in 
an application's environment. 

Consider the following example illustrated in 
FIGURES 12 and 13. Mr. Jones of Hometown wish- 
20 es to make a transfer to his son's trust fund. The 
security department for Mr. Jones' bank keeps 
track of all people involved in any transaction in- 
cluding both customers and employees. The secu- 
rity log and financial records are not in a mutual 
25 "all or nothing" embrace but the two work units 
may need to be processed concurrently-one rea- 
son could be that response time would be too slow 
if the two work units were processed serially. 

In the illustrated example, the work request for 
30 work unit C at time 1 involves resource manager 
63A which controls the security log in the bank's 
headquarters in Chicago. Unprotected conversation 
1 is used by resource adapter 62A to communicate 
with resource manager 63A. The work request for 
35 work unit D at time 1 also involves resource man- 
ager 63A in Chicago for Mr. Jones' trust fund while 
the request at time 7 is to resource manager 63B 
in Hometown where Mr. Jones' other financial 
records are kept. Unprotected conversation 2 is 
40 used by resource adapter 62A to communicate 
with resource manager 63 A and unprotected con- 
versation 3 is used by resource adapter 62B to 
communicate with resource manager 63B. 

When application 56A writes its first record, a 
45 "start security event" message, using work unit C, 
(Step 612 in FIGURE 14) resource manager 63A 
registers via its resource adapter 62A in application 
execution environment 52A. Sync point manager 
60A builds a registry entry for resource manager 
50 63A in FIGURE 12 table 126 under work unit C 
(Step 614). This entry contains the parameter list to 
pass to the exit for resource adapter 62A which 
includes the routine name of the exit and a special 
and private value that resource adapter 62A passed 
55 on registration. The resource adapter exit can use 
the special value to locate its control block for 
conversation 1. 

Consequently, when application 56A requests a 
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commit at time 19 for work unit C, sync point 
manager 60A reads table 126 to determine which 
resource adapter exits should be notified to initiate 
the commit procedure. In the illustrated embodi- 
ment, at time 19 when commit is requested for s 
work unit C, synchronization point manager 60A 
calls the exit routine for resource adapter 62A to 
initiate a one-phase commit procedure since only 
one protected resource is registered; resource 
adapter 62A's exit routine knows to use conversa- io 
tion 1 to communicate with resource manager 63A 
since it receives from synchronization point man- 
ager 60A the special value saved in table 126 
during registration. 

Registration is subsequently avoided (Step 15 
613) at time 26 when logging the employee id of 
the bank clerk handling Mr. Jones' transaction. Re- 
registration is not required because sync point 
manager 60 A already knows from the work unit 
registration table 126, that resource manager 63A 20 
is participating in work unit C. Consequently, the 
processing of each work request for work unit C 
after the first work request and the subsequent 
commit at time 44 is expedited. Also, at each 
synchronization point for work unit C, only resource 25 
adapter 62A and resource manager 63A are noti- 
fied; there is no time wasted notifying other re- 
source adapters or other resource managers. 

When application 56A makes work requests at 
times 1 and 7 under Work Unit D, both resource 30 
adapters 62A and 62B register with sync point 
manager 60A which adds registry entries 63A and 
63B to table 127. 

When the first security log commit is done at 
time 19, the trust fund update at time 17 is not 35 
affected in any way. When the trust fund and 
financial records are committed at time 33, the 
clerk-id message is not affected either. Note that 
resource manager 63A in Chicago is not confused 
since it is communicating on two separate con- 40 
versations, 1 and 2, to application 56A. 

The development of a resource adapter is sim- 
plified because system 50A knows which work 
units are active for the resource manager, relieving 
the resource adapter of that task. Since the design 45 
is simple the resource adapter exit performs well; it 
has everything it needs and simply sends sync 
point manager 60 A' s actions to its resource man- 
ager. Another performance perspective is that sync 
point manager 60A can optimize synchronization 50 
point procedures because it knows for which work 
units the resource manager is active, avoiding the 
overhead of calling resource adapters or resource 
managers for resources which are not involved in 
synchronization points. 55 

In system 50A, there may be occasions when 
the type of work request made on a protected 
resource, such as a shared file or database, 



changes the state of the resource such that the 
registration information should be changed. This is 
important because an original work request may be 
a read-only request and require only a one-phase 
commit procedure, but a subsequent related work 
request under the same work unit may be a write 
request and require a two-phase commit procedure 
in order to coordinate the multiple protected re- 
sources involved. 

As another example illustrated in FIG. 3, an 
application 56A typically makes one or more read 
requests on a file before making a write request in 
order to locate a particular record in the file to 
update. Such read operations can be implemented 
using a one-phase commit procedure in which 
case, upon receipt of the read work request by 
resource adapter 62A (Step 500), the resource 
adapter registers with syncpoint manager 60A for 
read mode (Step 502). It should be noted that 
during subsequent read operations, the resource 
adapter 62A need not interact with syncpoint man- 
ager 60A because there is no change in the type of 
commit procedure that is required. However, when 
application 56A subsequently makes a write re- 
quest to resource adapter 62A under the same 
work unit (Step 504), resource adapter 62A 
changes its registration status with syncpoint man- 
ager 60A to write mode. As described in more 
detail below, the rather time-consuming two-phase 
commit procedure will be used if more than one 
protected resource is registered for write mode on 
the same work unit. 

This example of registration change is illus- 
trated in detail by the flow chart of FIG. 11. When 
the work request in step 580 is the first one for the 
protected resource and the request is read-only, 
decision block 581 leads to decision block 582. It 
should be noted that the resource adapter 62A 
keeps an internal indicator for each resource under 
each work unit for which it has already registered. 
This indicator is tested in decision block 581. The 
resource is not a protected conversation, therefore 
decision block 582 leads to decision block 583. 
Because the work is read-only, decision block 593 
leads to step 585. In step 585, the corresponding 
resource adapter 62A registers as a read-only re- 
source. When the next work request to step 580 is 
to write into, or update, the same resource under 
the same work unit, decision block 581 leads to 
decision block 584 because the resource adapter 
62A previously registered in step 585, albeit for 
read mode. Decision block 584 leads to decision 
block 586 because the resource is not a protected 
conversation, and decision block 586 leads to de- 
cision block 588 because the request is for update 
mode. Next, decision block 588 leads to step 590 
where the resource adapter 62A (which had pre- 
viously registered in step 585 for read mode) 
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changes its registration within syncpoint manager 
60A to write mode. It should be noted that accord- 
ing to FIG. 11, if the first work request under a 
work unit for the resource is write mode, then the 
resource adapter 62A registers for write mode in 
step 592. 

There is also the situation of a resource man- 
ager 63 which has completed a sync point and has 
had no further requests since completing that sync 
point. Its resource adapter 62 is allowed to modify 
its registration status to "suspended", at the com- 
pletion of a sync point procedure, so that the sync 
point manager 60 will know that resource manager 
63 is currently not participating in any sync points 
for the work unit. The suspension of a write mode 
resource may permit sync point manager 60 to 
optimize a subsequent commit procedure (one- 
phase commit) for the remaining resources when, 
for example, there is only one other write mode 
resource in the work unit. If the suspended re- 
source adapter 62 receives a new work request for 
the work unit, it can reactivate its registration 
through the same registration modification function. 

The designs of certain resource managers re- 
quire that their resource adapters register early in 
their interaction with the application in order to be 
notified of distributed sync point activities. How- 
ever, they may not have a complete set of registra- 
tion information at that time. For example, the pro- 
tected conversation adapter 64A needs to register 
at the point that it initiates a protected conversation 
with a partner application 56D because it needs to 
know if a sync point occurs, yet it will not have all 
registration information until the conversation part- 
ner accepts the conversation, an event which may 
occur much later. This information can be added 
later under the foregoing change of registration 
process illustrated in step 590. 

System 50 provides additional time-saving 
techniques in the registration process. When each 
resource adapter 62 registers a first time with sync- 
point manager 60, it registers information in addi- 
tion to the identification of the resource manager 
63 and the resource adapter exit routine name for 
sync point processing. Much of this additional in- 
formation usually does not change when the reg- 
istration changes. Consequently, this additional in-, 
formation is not re-registered when the registration 
changes in step 590 for a resource adapter 62. The 
following is a list of some of the additional informa- 
tion which the resource adapter 62 registers only 
once with the syncpoint manager and which does 
not change when other registration information 
changes: 

1. Resource and network identifiers which de- 
scribe where the resource manager and re- 
source are located in the system and the net- 
work; 



2. Product identifier which indicates the product 
and thus the type of resource-e.g., shared file, 
database, protected conversation etc.; and 

3. Additional data which is required for resynch- 
5 ronization. 

Because this additional information is not re- 
registered each time, the registration process is 
expedited. 

There are a variety of occasions when an ap- 

w plication can or will no longer use a protected 
resource. Examples include such events as end of 
application, termination of a resource manager, or 
unavailability of the path to the resource manager. 
There may be application / resource manager pro- 

75 tocols which allow the application to declare a 
resource to no longer be in use. The application 
execution environment may support protocols 
which make it appropriate to un register resources 
prior to end of application. Protected conversations 

20 may also terminate due to application action or due 
to an error condition such as a path failure. Upon 
any such occasion, it is preferable for the resource 
adapter or protected conversation adapter to un- 
register ail applicable instances of the resource 

25 from the syncpoint manager because such un- 
registration will make subsequent syncpoint pro- 
cessing more efficient (fewer resources to consider 
and probably less memory consumed) (step 618 of 
FIGURE 14). In addition, the resource adapter or 

30 protected conversation adapter can delete any con- 
trol information about the registered resource and 
thus be more efficient in its subsequent process- 
ing. 

FIGURE 15 shows the flow of unregistration 
35 activity when a resource adapter 62 or a protected 
conversation adapter 64 discovers that a resource 
78 or protected conversation is not available (step 
904) or that the application has ended (step 903). 
Note that the adapter would typically discover that 
40 the resource is not available while processing an 
application work request (step 902). The adapter 
would determine from its own resource registration 
status information what registered resources should 
be unregistered (step 906). For each such regis- 
45 tered resource, the adapter would call the sync- 
point manager 60 to unregister the resource (step 
907). Note that the adapter must identify the re- 
source and the work unit to the syncpoint manager 
60. 

so In FIGURE 15, for each call to syncpoint man- 

ager 60 (step 910), the syncpoint manager 60 uses 
the adapter-supplied work unit identifier to locate 
the work unit resource table (step 911). Within this 
work unit resource table, the syncpoint manager 60 

55 uses the adapter-supplied resource identifier to lo- 
cate the desired resource entry (step 912). The 
syncpoint manager 60 then flags the resource en- 
try as unregistered (step 913) and returns to the 
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calling adapter (step 914 back to step 907). How- 
ever, the syncpoint manager 60 cannot yet erase 
the unregistered resource entry because the re- 
source entry logically contains error information 
which must be preserved until the next synchro- 
nization point (see "Coordinated Handling of Error 
Codes and Information Describing Errors in a Com- 
mit Procedure"). 

The adapter can now delete its control informa- 
tion (or otherwise mark it as unregistered) about 
the unregistered resource (step 908). Note that an 
event which causes unregistration may cause mul- 
tiple resource registrations to be deleted (for exam- 
ple, a resource may be registered for multiple work 
units). Thus, steps 906, 907, and 908 can be a 
program loop to handle each applicable previously 
registered resource. At this point, the adapter can 
return to its caller (step 909). If the work request 
has failed due to an unavailable resource, the 
adapter can report the error condition to the ap- 
plication by whatever mechanism the resource 
adapter has chosen to return error information to its 
application users. 

The resource adapter may have other process- 
ing considerations as a result of the unavailable 
resource or the application termination. For exam- 
ple, if the unavailable resource condition will cause 
the backout of resource updates, the adapter will 
need to notify the application and/or the syncpoint 
manager 60 that the next syncpoint on the ap- 
plicable work unit(s) must be a backout. This con- 
dition during syncpoint processing requires the 
adapter to notify syncpoint manager 60 of the 
resource status (which is backing out). There may 
be other resource, environment, or implementation 
dependencies. 

Syncpoint manager 60 is now concerned with 
handling the flagged unregistered resources (from 
step 913) so that they are ignored for normal 
operation and so that they are eventually erased. 
Syncpoint manager 60 can erase flagged un- 
registered resource entries at the beginning of the 
next syncpoint for the affected work unit. FIGURE 
16 describes the syncpoint process flow within 
syncpoint manager 60. When the next syncpoint 
process reads the registered resource table (step 
622), it can erase any flagged unregistered re- 
source entries in that table (an. action not shown in 
FIGURE 16). Because step 622 builds all syncpoint 
resource participation lists for the duration of the 
current syncpoint process, resource un registrations 
and modifications of resource registry entries by 
adapters will not affect the current syncpoint pro- 
cess. At this point, the total unregistration process 
is complete. 

OPTIMIZATION OF COMMIT PROCEDURES 



Each participating resource manager is capable 
of performing the two-phase commit procedure, 
such as the two-phase commit procedure de- 
scribed by System Network Architecture LU 6.2: 
5 Peer Protocols, SC31-6808, Chapter 5.3 Pre- 
sentation Services - Sync Point verbs, and may 
or may not be capable of performing the one- 
phase commit procedure. The two-phase commit 
procedure is important to protect resources; how- 
io ever, the two-phase commit procedure is a rela- 
tively complex and time consuming process com- 
pared to the one-phase commit procedure. For 
example, as described in more detail below, the 
two-phase commit procedure requires the time- 
75 consuming step of logging information about the 
sync point participants in the recovery facility log 
72 (FIG. 2), whereas the one-phase commit proce- 
dure does not require such logging. Also, the two- 
phase commit procedure requires two invocations 
20 of the resource adapter coordination exit to perform 
the commit, whereas the one-phase commit proce- 
dure requires only one such invocation to commit 
data. A "resource adapter coordination exit" is the 
mechanism for the sync point manager 60 (FIG. 2) 
25 to provide information to the associated resource 
manager. The sync point manager utilizes the two- 
phase commit procedure only when necessary to 
make the system operate as expeditiously as pos- 
sible. In summary, the sync point manager utiles 
30 the two-phase commit procedure whenever a pro- 
tected conversation is involved, or at least two 
resources are in update mode, or one or more 
participating resource managers is not capable of 
performing the one-phase commit procedure. 
35 Whenever all resources are capable of performing 
the one-phase commit procedure and no more than 
one resource is in update mode, the sync point 
manager utilizes the one-phase commit procedure. 
Also, if any resource is in read-only mode such 
40 that the data in the resource is read and not 
updated and the resource manager is capable of 
performing the one-phase commit procedure, then 
a one-phase commit procedure is used for this 
resource regardless of the type of commit proce- 
45 dure used for the other resources. A key compo- 
nent of this optimization is the resource manager's 
ability and resource adapter's ability to determine 
prior to the synchronization point its state defined 
by the work request, that is, whether the resource 
so is in read-only mode or in update mode. When a 
resource is in read-only mode, it means that the 
application has only read data from the resource. 
When a resource is in update mode, this means 
that the application has changed the data in the 
55 resource. 

The optimization process begins as follows. 
Application 56 (FIG. 2) makes a work request to a 
resource (step 612 of FIG. 14). If this is the first 
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work request for a particular work unit (decision 
block 613 in FIG. 14), the resource adapter 62 
(FIG. 2) associated with the resource registers with 
the synchronization point manager the fact that it is 
now an active, participating resource for the work 
unit (step 615 in FIG. 14). One of the pieces of 
information about the resource that must be pro- 
vided at registration time (step 616 in FIG. 14) is 
whether the associated resource manager is ca- 
pable of performing the one-phase commit proce- 
dure, e.g., is the resource a database manager 
which under certain circumstances could perform a 
one-phase commit procedure. Also during registra- 
tion, the resource adapter records with the sync 
point manager whether the work request made by 
the application placed the resource in the read-only 
mode or update mode (step 616 in FIG. 14). 

After the initial registration of a resource, sub- 
sequent work requests made by the application 
against that resource may change the state of the 
resource. That is, the resource may change from 
read-only to update mode. When these changes 
occur, the resource adapter must inform the sync 
point manager about these changes, and the reg- 
istration information is updated to reflect the new 
state (step 619 in FIG. 14). 

If the work request from the application is for a 
protected conversation, the registration entry for 
the protected conversation adapter will always 
show that the protected conversation adapter is in 
update mode and that it is not capable of perform- 
ing a one-phase commit procedure. Since the pro- 
tected conversation adapter represents a commu- 
nication path to another application execution envi- 
ronment, which may involve a plurality of re- 
sources, it is not possible for the protected con- 
versation adapter to determine accurately if it re- 
presents a communication path to read-only mode 
resources or to update mode resources. Therefore, 
the presence of a communication path to another 
application execution environment requires the two- 
phase commit procedure, to provide the necessary 
protection of the critical resources. The protected 
conversation adapter insures that the two-phase 
commit procedure will be used by registering as an 
update mode resource that is not capable of per- 
forming the one-phase commit procedure. 

After the application has completed all its work, 
it will attempt to either commit or back out the data 
at the resources. To accomplish this, the applica- 
tion issues a sync point request to the sync point 
manager. To start processing the sync point re- 
quest, (step 620 in FIG. 16) the sync point man- 
ager reads the work unit table to find the entry for. 
the affected work unit (step 621 in FIG. 16). For 
more information on work units, see Local and 
Global Commit Scopes Tailored To Work Unit 
Once the correct work unit entry is located, the 



sync point manager reads the information in that 
entry about the resources registered for that work 
unit and creates three lists of resources (step 622 
in FIG. 16). 

5 Each of these lists has a different meaning. 

The read-only list contains those resources whose 
data has only been read by the application. The 
update list contains those resources whose data 
has been changed by the application and those 
w resources that are in read-only state but whose 
resource manager is not capable of performing the 
one-phase commit procedure. The initiator list con- 
tains the list of communication partners that have 
sent a message that they want to synchronize 
15 updates to resources. Each resource may appear 
in only one of the lists. 

In practice, the registration for each resource 
includes two flags whichare read by the sync point 
manager and used to determine if a resource 
20 should be entered into the update list or the read- 
only list. The first flag is on when the resource is in 
read-only mode, and is off when the resource is in 
update mode. The second flag is on when the 
resource supports both the one-phase commit pro- 
25 cedure and the two-phase commit procedure, and 
is off when the resource is capable of performing 
only the two-phase commit procedure. In practice, 
the registration for each resource also includes a 
field that contains information about whether this 
30 resource adapter received a message from a com- 
munication partner indicating that it wants to syn- 
chronize updates to resources. The sync point 
manager reads this field and uses the data to 
determine if the resource should be entered into 
35 the initiator list. 

Once the lists of resources have been built, the 
sync point manager examines the sync point re- 
quest type (decision block 623 in FIG. 16). If the 
sync point request is to back out, the sync point 
40 manager performs backout processing as follows. 
First, all the resource adapters in the update list, if 
any, are told to back out the changes to their 
resource (step 626 in FIG. 16). Then, all the re- 
source adapters in the read-only list, if any, are told 
45 to back out the effects on their resource (step 627 
in FIG. 16). It should be noted that the processing 
of a "backout" for a read-only resource is defined 
by the resource implementation, since there are no 
changes to the actual data in the resource to be 
so backed out. For example, processing for a backout 
of read-only file in a shared file resource manager 
63 (FIG. 2). could include closing the file and 
discarding any file positioning information previous- 
ly maintained for the application's use. After the 
55 read-only resources are told to back out, then all 
the resource adapters in the initiator list, if any, are 
told that this application execution environment 
backed out the changes for this synchronization 
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point (step 628 in FIG. 16). 

If instead the sync point request is to commit 
(decision block 623 in FIG. 16). then the sync point 
manager starts the optimization process for the 
commit. The First step in the optimization process 
is to determine if the initiator list is not empty 
(decision block 624 in FIG. 16). If the initiator list is 
not empty, this means that this application execu- 
tion environment is a cascaded initiator in the sync 
point tree, and that the full two-phase commit pro- 
cedure must be used for this commit. This is 
necessary because neither application execution 
environment knows the full scope of the sync point 
tree, that is, how many resources are active and in 
update mode for this synchronization point. Since 
the number is not known, the two-phase commit 
procedure must be used, to provide the necessary 
protection of these critical resources. 

If the initiator list is empty (decision block 624 
in FIG. 16), the next step is to determine if more 
than one resource is in the update list (decision 
block 625 in FIG. 16). If this is true, then the full 
two-phase commit procedure must be used for this 
commit. The two-phase commit procedure provides 
more protection for the update mode resources, 
because no resource commits its changes until all 
resources have voted that they can commit their 
changes. 

If there are less than two resources in the 
update list (decision block 625 in FIG. 16), the next 
step is to determine if there are zero or one re- 
sources in the update list 640 (FIG. 16). If there are 
zero resources in the update list, then the one- 
phase commit procedure will be used to commit 
the read-only resources. Likewise, if there is ex- 
actly one resource in the update list, and its re- 
source manager is capable of performing the one- 
phase commit procedure, then the one-phase com- 
mit procedure will be used. 

The one-phase commit procedure starts by the 
sync point manager telling the resource adapters in 
the update list, if any, to commit their changes 
(step 641 in FIG. 16). It should be noted that the 
one-phase commit of data by the resource man- 
ager is achieved by only one invocation of the 
resource adapter, in contrast with the two invoca- 
tions needed during the two-phase commit proce- 
dure. Since there can be only zero or one re- 
sources in update mode in the entire synchroniza- 
tion point, there is no chance of data inconsistency 
caused by different decisions for different re- 
sources. Also note that during the one-phase com- 
mit procedure, there is no writing to the recovery 
facility log 72 (FIG. 2), as opposed to the required 
logging that is part of the two-phase commit proce- 
dure (steps 644, 648, 651, 658, 659 of FIG. 17). 
The one-phase commit procedure ends with the 
sync point manager telling the resource adapters in 
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the read-only list, if any, to commit their changes 
(step 642 in FIG. 16). It should be noted that a 
"commit" of a read-only resource is defined by the 
resource implementation, since there are no actual 
5 changes to the data to be committed. For example, 
some shared file resource managers 63 (FIG. 2) 
provide read consistency, so when an application 
reads a file in a shared file resource manager, the 
application is provided with a consistent image of 
70 the file, that is, changes made to the fife by other 
application environments will not interfere with the 
reading of the contents of the file, as they existed 
at the time the file was opened. When the applica- 
tion opens the file with the intent of read, the image 
75 is created by the resource manager, which is con- 
sidered to be a read-only resource. When the ap- 
plication is done reading the file, it closes the file 
and attempts a commit. When the shared file re- 
source manager performs the commit as a read- 
20 only resource, it could discard the image main- 
tained for the application's use. Now, if the applica- 
tion opens the file again, it will see an image of the 
file which contains all committed updates made by 
other applications. 
25 If the sync point request results in a two-phase 
commit procedure according to the outcome of 
decision blocks 624, 625, or 640 of FIG. 16, the 
sync point manager 60 (FIG. 2) still optimizes the 
commit of the read-only resources. There are sev- 
30 eral parts to this optimization for the read-only 
resources. First, (step 644 of FIG. 17) information 
about the read-only resources is not written to the 
recovery facility log 72 (FIG. 2). Information about 
the read-only resources does not have to be 
35 logged at the recovery facility 70 (FIG. 2) because 
the read-only resources will never log the state of 
"In-doubt" on their own logs. This means that the 
resource manager will never attempt to resynch- 
ronize with the recovery facility 70 (FIG. 2), so the 
40 recovery facility does not need any knowledge 
about the resource. Second, the read-only re- 
sources are not involved in the first phase of the 
commit, which is sending prepare to all resource 
adapters in the update list (step 645 of FIG. 17). 
45 The actions of a read-only resource cannot affect 
the protection of the resources, since in terms of 
data consistency, a backout is equivalent to a com- 
mit for a read-only resource. 

The only time that the read-only resources are 
so involved in the two-phase commit procedure is 
when they are told the final direction of the commit, 
that is, they are told whether to commit their 
changes (step 653 of FIG. 17) or told to back out 
their changes (step 655 of FIG. 17). 
55 The following is an example of a two-phase 
commit procedure involving three different applica- 
tion execution environments, which are part of a 
system such as System 50 (FIG. 2). Each applica- 
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tion execution environment is executing a different 
application. Application A and Application B are 
communicating via a protected conversation; Ap- 
plication B and Application C are communicating 
via a protected conversation. The two-phase com- 
mit procedure is started when Application A at- 
tempts to commit by issuing a commit request B1 
(FIG. 18) to the sync point manager which is cur- 
rently running in the same execution environment 
as Application A. Phase one starts when the sync 
point manager writes the SPM Pending log record 
to the recover facility log B2 (FIG. 18). The SPM 
Pending log record contains the logical unit of work 
identifier for the synchronization point and informa- 
tion about the synchronization point participants, in 
this case, the SPM Pending record shows one 
participant, Application B. 

After the SPM Pending log record is success- 
fully written to the recovery facility log, the sync 
point manager sends a prepare message via the 
protected conversation adapters to Application B. 
Application B is notified that its conversation part- 
ner Application A wishes to synchronize resources, 
and Application B subsequently issues a commit 
request B3 (FIG. 18) to the sync point manager 
which is currently running in the same execution 
environment as Application B. 

For the sync point manager at B, the first 
phase of the two-phase commit procedure starts by 
writing the SPM Pending record to the recovery 
facility log B4 (FIG. 18). The SPM Pending record 
contains the logical unit of work identifier for the 
synchronization point and information about the 
synchronization point participants. In this case, the 
SPM Pending log record contains information about 
Application A, showing it as the synchronization 
point initiator, and Application C as a synchroniza- 
tion point participant. Once the SPM Pending log 
record is successfully written to the recovery fa- 
cility log, the sync point manager sends a prepare 
message via the protected conversation adapters 
to Application C. Application C is notified that its 
conversation partner Application B wishes to syn- 
chronize resources, and Application C subsequent- 
ly issues a commit request B5 (FIG. 18) to the 
sync point manager which is currently running in 
the same execution environment as Application C. 

The sync point manager starts the first phase 
of the two-phase commit procedure by writing the 
SPM Pending record to the recovery facility log B6 
(FIG. 18). The SPM Pending record contains in- 
formation about the synchronization point partici- 
pants and the logical unit of work identifier for the 
synchronization point. In this instance, the SPM 
Pending record contains information about Applica- 
tion B, which is the synchronization point initiator. 
The SPM Pending records also shows that there 
are no synchronization point participants for Ap- 
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plication C. 

Since there are no more participants, there is 
no need for the sync point manager at C to send a 
prepare message via any protected conversation 

5 adapter. The sync point manager at C then sends a 
state record to the recovery facility, updating the 
state of the syncpoint to Agent, In-Doubt B7 (FIG. 
18). Once the state record is successfully written to 
the recovery facility log, the sync point manager at 

w C responds to the prepare message by sending a 
request commit message via the protected con- 
versation adapters to the sync point manager at B. 

The sync point manager at B receives the 
request commit message from the sync point man- 

75 ager at C via the protected conversation adapters. 
Since only request commit messages were re- 
ceived, the next step is to send a state record to 
the recovery facility, updating the state of the syn- 
chronization point to Agent, In-Doubt B8 (FIG. 18). 

20 Once the state record is successfully written to the 
recovery facility log, the sync point manager at B 
responds to the prepare message from A by send- 
ing a request commit message via the protected 
conversation adapters to the sync point manager at 

25 A. 

The sync point manager at A receives the 
request commit message from the sync point man- 
ager at B. which completes the first phase of the 
synchronization point. The sync point manager 

30 must then make the decision, as the synchroniza- 
tion point initiator, whether to commit or back out 
the logical unit of work. Since only request commit 
messages were received by the sync point man- 
ager at A, the sync point manager at A will decide 

35 to commit the logical unit of work. The second 
phase of the two-phase commit procedure starts by 
the sync point manager recording this decision by 
sending a state record to the recovery facility. The 
state record changes the state of the synchroniza- 

40 tion point to Initiator, Committed B9 (FIG. 18). Once 
the state record is successfully written to the re- 
covery facility log, the sync point manager sends a 
committed message via the protected conversation 
adapters to the sync point manager at B. 

45 The sync point manager at B receives the 

committed message, which completes the first 
phase of the two-phase commit procedure. The 
second phase is started when the sync point man- 
ager sends a state record to the recovery facility, 

so updating the state of the synchronization point to 
Initiator-Cascade, Committed B10 (FIG. 18). The 
sync point manager at B then sends a committed 
message to the sync point manager at C via the 
protected conversation. 

55 The sync point manager at C receives the 
committed message, which completes the first 
phase of the two-phase commit procedure. The 
sync point manager at C starts the second phase 

24 
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by sending a state record to the recovery facility, 
updating the state of the synchronization point to 
Initiator-Cascade, Committed B11 (FIG. 18). Since 
there are no more participants to receive the com- 
mitted message, the sync point manager at C is 
finished with the synchronization point. To record 
this, the sync point manager at C sends a state 
record to the recovery facility, updating the state of 
the synchronization point to Forget B12 (FIG. 18). 
This state tells the recovery facility that all records 
written by the sync point manager at C for the 
logical unit of work identifier are no longer needed 
and can be erased. After the state record is suc- 
cessfully written to the recovery facility log, the 
sync point manager at C responds to the commit- 
ted message by sending a forget message to the 
sync point manager at B via the protected con- 
versation adapters, which ends the second phase 
of the two-phase commit procedure for the sync 
point manager at C. After the forget message is 
sent, the sync point manager at C returns control 
to Application C, with an indication that the syn- 
chronization point has completed successfully. 

The sync point manager at B receives the 
forget message from the sync point manager at C 
via the protected conversation adapters. The re- 
ceipt of the forget message indicates that the sync 
point manager at B has completed the synchro- 
nization point. To record this, the sync point man- 
ager at B sends a state record to the recovery 
facility, updating the state of the synchronization 
point to Forget B13 (FIG. 18). This state tells the 
recovery facility that all records written by the sync 
point manager at B for the logical unit of work 
identifier are no longer needed and can be erased. 
After the state record is successfully written to the 
recovery facility log, the sync point manager at B 
responds to the committed message by sending a 
forget message to the sync point manager at A via 
the protected conversation adapters, which ends 
the second phase of the two-phase commit proce- 
dure for the sync point manager at B. After the 
forget message is sent, the sync point manager at 
B returns control to Application B, with an indica- 
tion that the synchronization point has completed 
successfully. 

The sync point manager at A receives the 
forget message. The receipt of the forget message 
indicates that the sync point manager at A has 
completed the synchronization point To record 
this, the sync point manager at A sends a state 
record to the recovery facility, updating the state of 
the synchronization point to Forget B14 (FIG. 18), 
which tells the recovery facility that all records 
written by the sync point manager at A for the 
logical unit of work identifier are no longer needed 
and can be erased. This ends the second phase of 
the two-phase commit procedure for the sync point 



manager at A, which means that the sync point has 
completed at every participant. After the state 
record is successfully written to the recovery fa- 
cility log, the sync point manager at A returns 
5 control to Application A, with an indication that the 
synchronization point has completed successfully. 

COORDINATED HANDLING OF ERROR CODES 
AND INFORMATION DESCRIBING ERRORS IN A 
70 COMMIT PROCEDURE 

Figures 29-32 illustrate components of system - 
50A which provide to application 56A a return 
code, if any resource or protected conversation 
is reports an error or warning. Also, application 56A 
can request detailederror information from each 
resource and protected conversation. The detailed 
error information identifies the reporting resource 
and describes the reason for synchronization point 

20 errors or could be a warning about the synchro- 
nization point. 

Application 56A is running in application execu- 
tion environment 52A (see Figure 32) in system 
50A. Resource adapter 62A is the adapter for a 

25 shared file resource manager 63A, resource adapt- 
er 62G is the adapter for SQL resource manager 
63G, and protected conversation adapter 64A is the 
adapter for a protected conversation with system 
SOB via protected conversation adapter 64B. In this 

30 example, adapters 62A and 64A have the same 
product identifier since they are integral compo- 
nents of the system control program in system 
50A; adapter 62G has a unique product identifier 
since it is part of a different product; adapters 62A 

35 and 64A have different resource adapter exit iden- 
tifiers. For illustrative purposes, resource adapter 
62G produces error blocks that are indecipherable 
to adapter 56A and has a prior art function to return 
detailed error information to adapter 56A. 

40 In response to work requests (Step 651, Figure 
29), adapters 62A and 62G and 64A register (Step 
653), with sync point manager 60. Sync point man- 
ager 60 creates registry objects 162A, 162B, and 
162C, filling in the identifiers of the participating 

45 resources (shared file resource manager 63A, SQL 
resource manager 63G and the protected conver- 
sation partner in system 50B). Also, the registration 
information includes the resource adapter exit rou- 
tine names, product identifiers for the resources 

so and protected conversation, and the required 
length of an error block for each resource. The 
resource adapter exit name is required when a 
product such as the system control program in 
system 50A in this illustrated example, owns two 

55 resource types. The product identifier and the re- 
source adapter exit name both identify the partici- 
pating resource type e.g. a shared file resource 
manager, a SQL resource manager, or a protected 
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conversation. All resource adapters of the same 
resource type within an execution environment use 
error blocks from the same pool to reduce the 
paging set of the system 50A. (See Figure 31 for a 
graphical description.) If a resource asks in Step 
653 (Figure 29) for an error block of the same size 
as another resource type, the error block pool is 
shared by both resources. 

For each registrant (62A, 62G, and 64A) the 
parameter list to call a resource adapter exit is built 
by sync point manager 60; it contains the address 
and length of usable error information of the re- 
source's error block. Placing the usable error in- 
formation length in the registry entry results in 
system 50A's paging set being unaffected if no 
error occurs. 

Next, application 56A requests a commit from 
sync point manager 60 ( Step 654, Figure 29). If 
application 56A desires detailed information from 
shared file resource manager 63A in the event an 
error occurs during this synchronization point-a 
prior-art function of shared file resource manager in 
system 50A-then application 56A transmits an er- 
ror data address on the Commit verb (Step 654, 
Figure 29) of a data area in its execution environ- 
ment to store a copy of the detailed error informa- 
tion. This area is used if resource manager 63A 
reports an error or warning. The sync point man- 
ager 60 receives the verb instead of the shared file 
resource adapter 62A and the error data address is 
saved by the sync point manager 60. On comple- 
tion of the synchronization point all errors and 
warnings (stored in error block 66A, Figure 29) 
would be moved to application 56A's error data 
area (not shown). Thus, compatibility with the prior- 
art error-pass-back architecture of shared file re- 
source manager is preserved. 

In Step 655 (Figure 29) sync point manager 60 
passes each resource adapter (62A, 62G, 64A, 
shown in Figure 32) the address of its error block 
(objects 66A-C) saved in registry objects 162A-C 
that were built for each resource adapter when the 
resource adapter registered ( Step 653). if there are 
no failures, the commit from Step 654 is complete, 
then sync point manager 60 reports back to ap- 
plication 56A the fact that the updates have been 
committed (Step 657). 

However, if a resource detects errors or war- 
nings, its adapter, 62A, 62G or 64A ( Step 670 in 
Figure 30) fills in the detailed error information 
using the error block 66A-C (Figure 29) as a place 
to store whatever is required by its design and 
updates the usable error length, which is an 
input/output parameter. Since a resource adapter 
exit can be called many times during a two-phase 
commit procedure it can append error information 
to the error block if necessary; it may have three 
warnings and one severe error for instead; it man- . 



ages the usable error length itself (Step 672). 

Sync point manager 60 receives from the re- 
source adapter exit a single return code in a com- 
mon format and proceeds with the two-phase com- 
5 mit procedure's logic (Step 673) ; Sync point man- 
ager 60 neither knows nor cares about the contents 
of the error blocks 66A-C. If the two-phase commit 
procedure's logic dictates an error or warning, the 
sync point manager transmits a consolidated return 

io code to application 56A (Step 657 in Figure 29 and 
674 in Figure 30). 

On receipt of the return code, application 56A 
asks for a detailed error block by calling a routine 
provided by sync point manager 60 (Step 676, 

15 Figure 30). In response, the error block manager 
(Function 690, Figure 32) within sync point man- 
ager 60 looks for a non-empty error block and 
moves it to application 56A's buffer. Other output 
parameters are the product identifier and resource 

20 adapter exit name for the owner of this error block. 
Application 56A then examines the product iden- 
tifier. If the reporting product is the system control 
program in system 50A (decision block 678, Figure 
30), then application 56A examines the resource 

25 adapter exit name to distinguish between the two 
system control program adapters. Now it can look 
at the error block for the resource name and the 
cause of failure ( Step 680A or B). Mapping macros 
are provided by the system control program in 

30 system 50 A for the shared file resource manager 
and for protected conversations to aid in reading 
error blocks. Also a routine (Interaction 693, Figure 
32) is provided by each adapter to reformat its 
error block into a convenient form, parameter list. 

35 Existing applications using the shared file resource 
manager require no change since its error-pass- 
back method is unchanged. Protected conversa- 
tions are new so the compatibility object is not 
violated for existing applications using communica- 

40 tions. 

If the product is a SQL resource manager 
(decision block 681 Figure 30), then the error block 
must be deciphered, assuming for illustration that it 
is not in a form which application 56A can pres- 

45 ently understand. Thus, application 56A asks re- 
source adapter 62G to identify the type of error in 
a form that application 56A can understand (Step 
682). In response ( Step 683), the SQL resource 
adapter 62G reads the error block from the sync 

so point manager, using a routine very similar to the 
routine used by application 56A but specialized for 
resource adapters. Note that the SQL resource 
adapter 62G and application 56A are given unique 
tokens so that both can loop through the same 

55 error blocks without confusion. SQL resource 
adapter 62G reformats the data in error block 66C 
(Figure 29) to a form compatible with application 
56A (Step 684 Figure 30), and then sends the 
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reformatted detailed error information to application 
56A ( Step 685). it should be noted that only a 
minor internal change is required to this example of 
a pre-existing SQL resource adapter to participate 
in coordinated handling of error information, i.e., it 
must ask sync point manager 60 for its error 
blocks. No change is required by pre-existing ap- 
plications if only one resource is updated by adapt- 
er 56A; the external appearance of the SQL re- 
source adapter error-pass-back interface is pre- 
served. Additional error codes indicating adapter 
56A is using a new function, coordinated synchro- 
nization point, are not considered an incompatibil- 
ity. 

Application 56A then queries sync point man- 
ager 60 to determine if there are additional error 
blocks (Step 676 Figure 30). If so (Decision block 
677), Steps 678-685 are repeated to obtain one or 
more additional error blocks from sync point man- 
ager 60. If there are no additional error blocks, 
decision block 677 leads to Step 688 in Figure 29 
in which application 56A continues processing, ei- 
ther to pursue a different function or to attempt to 
correct the failure. 

The sync point manager 60 keeps error blocks 
until the next synchronization point as described in 
the foregoing section entitled "Registration of Re- 
sources For Commit Procedure." 

LOG NAME EXCHANGE FOR RECOVERY OF 
PROTECTED RESOURCES 

When application 56 (FIG. 2) issues a sync 
point request, a two-phase commit procedure is 
initiated for committing changes for ail protected 
resources. Protected resources include protected 
resources such as data bases managed by a re- 
source manager, as well as a special classification 
of resources called protected conversations, which 
represent a distributed partner application. As 
noted in the section "Coordinated Sync Point Man- 
agement of Protected Resources for Distributed 
Application", the first phase in the two-phase com- 
mit procedure is to prepare the resources for the 
commit. Once all resource managers have agreed 
to a commit during the first phase, then the second 
phase accomplishes the actual commit. If any re- 
source is unable to prepare during the first phase, 
then all the resources are ordered to back out their 
changes during the second phase instead of com- 
mitting them. All resource data changes are subject 
to back out until the time that they are actually 
committed. 

In order to support a recovery procedure, as 
described in the section "Recovery Facility For 
Incomplete Sync Points For Distributed Applica- 
tion", for completing a sync point when the sync 
point cannot complete due to a failure, it is neces- 



sary that sync point information be previously 
stored and retained in recovery facility logs 72 and 
resource manager logs 800, which are in non- 
volatile storage facilities. Logging is done by each 
5 sync point manager 60 as well as by each partici- 
pating resource manager 63. Information recorded 
in the log includes the current state of the sync 
point from the standpoint of the logging sync point 
manager or resource manager, the current name(s) 
10 associated with the sync point log of known sync 
point participants, and, in the case of sync point 
managers, information required to establish con- 
versations with sync point participants at the time 
of recovery from sync point failures. 

75 Information concerning the log name of known 
sync point participants is logged separately or par- 
titioned from the remaining sync point information. 
The log name information is recorded in a log 
name log 72A2 (FIG. 19), while the remaining in- 

20 formation is recorded in a sync point log 72A1. 

When a failure occurs in recording information 
in any of the sync point logs, requiring that the log 
be reinitiated, in effect beginning a new log, the log 
is assigned a new name. When this occurs it is 

25 important that other sync point managers and re- 
source managers that are sync point participants 
with the holder and maintainer of the new log be 
notified that the log has been reinitialized and that 
a new name is in effect. 

30 It is essential for automatic resynchronization 
that each sync point manager and participant have 
valid sync point logs. That is, the logs at the time 
of resynchronization must be the same logs that 
were used during sync point. If any logs have been 

35 replaced or damaged then resynchronization can- 
not proceed normally. To ensure that all logs are 
correct, there is a pre-sync point agreement on the 
log names of each sync point manager and partici- 
pating resource, which is accomplished by a proce- 

40 dure called exchange of log names. There is an- 
other exchange of log names just before the resyn- 
chronization begins, whereupon, the log names of 
ail participants being determined to be the same as 
when the sync point began, the resynchronization 

45 can proceed to recover the failed sync point, know- 
ing that no participant had a log reinitialization. 
Without this procedure, invalid sync point log in- 
formation could lead to a failure in or erroneous 
results from the recovery processing. 

50 As an optimization for protected conversations 
between application environments in the same sys- 
tem (for example application environments 52A and 
52B in system 50A) it is not necessary to exchange 
log names since the respective sync point man- 

55 agers 60A and 60B share the same recovery fa- 
cility 70A and recovery facility log 72A. When there 
is a common recovery facility log 72A, the step of 
synchronizing logs (by exchanging log names) is 
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not necessary and may be omitted. Sync point 
manager logging is accomplished by the common 
recovery facility 70 which resides in the same 
system as the supported sync point manager(s) 60. 
All sync point managers 60A, 60B, and 60C in a 
system 50A share the common recovery facility 
70A and the same supporting pair (sync point and 
log name) of logs in recovery facility log 72 A. 

FIG. 33 illustrates three systems 50A, 50D, and 
50F, the recovery facilities in each, and commu- 
nications between the systems. Each application 
environment 52A, 52B, 52D, 52F, and 52G includes 
an application program 56A, 56B, 56D, 56F, and 
56G respectively (not illustrated), which utilizes a 
sync point manager 60A, 60B, 60D, 60F, and 60G, 
respectively, for purposes of coordinated resource 
recovery. A sync point manager uses the recovery 
facility in its system to manage the sync point and 
log name logs required for recovery from a failing 
sync point For example, the sync point managers 
in application environments 52A and 52B use the 
recovery facility 70 A to record in log 72A. Re- 
source managers 63A, 63B, 63D, 63E, 63F, and 
63G maintain their own sync point and log name 
logs 800A, 800B, 800 D, 800E, 800 F, and 800G, 
respectively. The illustrated scope of sync points, 
are indicated by solid lines and arrows. Although 
sync points may be initiated by any participant and 
the scope of a sync point is dynamic, the illustra- 
tion is static for simplicity of illustration. For the 
illustrated static cases, sync points flow between 
application environments 52B to 52D to 52F via the 
associated sync point managers and protected 
conversation adapters (not shown) via communica- 
tion solid lines 801 and 802; and from application 
environments 52A, 52B, 52D, 52F, and 52G via the 
associated sync point managers and resource 
adapters to the resource managers 63A, 63B, 63D, 
63E, 63F, and 63G via communications solid lines 
803A-1, 803A-2, 803B, 803D, 803E, 803F, and 
803G, respectively. The dotted lines show commu- 
nication paths employed at the time of pre-sync 
point agreements and at the time of resynchroniza- 
tion for recovering a failing sync point. For re- 
source managers, this dotted line communication is 
between the resource manager and the recovery 
facility of the system of the originating application 
environment, for example, resource manager 63E 
to 70A, not 70B. 

Three sync point scopes are included in FIG. 
33. The first involves a single application environ- 
ment 52A (and sync point manager) and utilizes 
two resource managers 63A and 63E. The second 
sync point scope involves three application envi- 
ronments 52B, 52D, and 52F each involving various 
participating resource managers (63B for 52B, 63D 
for 52D. and 63F.G for 52F), as further illustrated 
by a sync point tree in FIG. 34. 



FiG. 19 block diagram and FIGS 20A. 20B, 21. 
and 22 flowcharts illustrate by example the process 
for log name exchange involving a protected con- 
versation between system 50A and 50D. Applica- 
5 tion 56A initiates a protected conversation with 
application 56D (step 831 in FIG. 20A). Application 
56A is running in application environment 52A in 
system 50A and application 56D is running in ap- 
plication environment 52D in system SOD. The con- 
to versation initiation includes specification of a path 
(system identifier), "B" in the current example, and 
a resource identifier for the application partner. The 
path identifies system 50D and the resource iden- 
tifier identifies target application 56D. Resource 

75 identifiers are explained in detail below in this 
section. The system control program includes a 
facility which acts as the resource manager of 
applications, to support the establishment of an 
application resource identifier for applications and 

20 to recognize those identifiers when used in con- 
versation initiation, then to either activate the ap- 
plication in an execution environment or, if already 
activated, route the new conversation to that active 
application. Thus conversation routing for applica- 

25 tions utilize paths (system identifiers) and resource 
identifiers, where paths accomplish the routing be- 
tween systems, as interpreted by communication 
facilities, each of which represent a system, and 
resource identifiers accomplish routing to or activa- 

30 tion of an application in an execution environment 
within a system, as interpreted by the system 
control program which acts as the resource man- 
ager for application resources. 

Upon receipt of this conversation initiation, 

35 communication facility 57A searches its exchange 
log name status table (ELST) 208A for an entry for 
the current path, path B (step 833 in FIG. 20A). 
The exchange log name status table entry for path 
B indicates by status zero that no protected con- 

40 versations have occurred on this path since system 
50A was last initiated. Therefore (decision step 834 
in FIG. 20A), the exchange log name status table 
entry 208A for path B is changed to status one 
(step 836 in FIG. 20A), the conversation initiation 

45 message 505 FIG. 19 is intercepted, and the con- 
versation is suspended by the communication fa- 
cility 57A (step 837 in FIG. 20A). Next, communica- 
tion facility 57A FIG. 19 sends message 200 FIG. 
19 on a control path to the local recovery facility 

so 70A to indicate that an exchange of log names 
should be initiated for path B before the conversa- 
tion initiation is accepted by the communication 
facility 57A (step 838 in FIG 20A). 

Recovery Facility 70A receives this message 

55 (step 850 in FIG. 21) and then sets ELST 207A 
entry for path B to status 1. indicating that ex- 
change of log names for path B is in progress (step 
851 in FIG. 21). Then recovery facility 70A FIG. 19 
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initiates a non-protected conversation on commu- 
nication path B (message 202 FIG. 19). Since the 
conversation is NOT "protected", there is no pos- 
sibility of interception by a communication facility 
since only protected conversations are monitored 5 

• 

for interception to enforce log name exchange pro- 
cedures. The routing from system 50A to system 
50D through their communication facilities is as 
described above. 

The conversation initialization also utilizes a w 
globally reserved resource identifier called 
protected conversation recovery resource iden- 
tifier, which permits routing to the recovery facility 
70D of the identified target system 50D. As each 
recovery facility 70A, 70D is initialized, the recoy- 75 
ery facility identifies itself to the system control 
program as the local resource manager for the 
global resource called 
"protected_conversation_recovery". The result is 
that the system control program for system 50D • 20 
routes conversations with the 

protected conversation recovery resource iden- 
tifier to the local recovery facility 70D and that 
recovery facility 70D also determines, based on the 
protected__conversation_recovery resource idenr 25 
tifier that was used to initiate the conversation, that 
the purpose of the conversation is to exchange log 
names with another recovery facility 70A. The initial 
message 202 FIG. 19 in this conversation includes 
the log name of log 72A along with an indication of 30 
whether that log name is "new", that is whether the 
name of the log was changed to reflect a new log 
as a result of a major failure/loss associated with 
the "old" log (step 852 in FIG. 21). The current 
example assumes the log is not new. Recovery 35 
Facility 70 A waits for a response to message 202 
FIG. 19. 

After recovery facility 70D receives the log 
name information transmitted by recovery facility 
70A along communication line 202 (step 870 in 40 
FIG. 22), recovery facility 70D sets ELST 207D for 
path B to status 1 and the local communication 
facility 57D is notified via message 203 FIG. 19 to 
also change ELST 208D to status 1 for path B 
(step 871 in FIG. 22). Steps 841 in FIG. 20B, 842 in 45 
FIG. 20B, 843 in FIG. 20B, and 846 in FIG. 20B 
illustrate the steps for changing the ELST in a 
communication facility. Recovery facility 70D deter- 
mines from message 202 FIG. 19 that the log 
name of recovery facility 70A is not new (decision 50 
step 872 in FIG. 22) and that its own log is also not 
new (decision step 876 in FIG, 22), and finally that 
the log name in message 202 FIG. 19 matches 
with the log name stored in recovery facility 70D 
log name log 72D2 entry for path B (decision step 55 
877 in FIG. 22); therefore ELST 207D is set to 
status 2 for path B and the local communication 
facility 57D is notified via message 203 FIG. 19 to 
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also change ELST 208D to status 2 for path B 
(steps 879 in FIG. 22, 841 in FIG. 20B, and 842 in 
FIG. 20B). Then recovery facility 70D responds 
(message 206 FIG. 19) normally to recovery facility 
70A by passing the log name of its log 72D and an 
indication of whether it is new or not (step 882 in 
FIG. 22). 

Recovery facility 70 A receives this normal re- 
sponse (decision step 853 in FIG. 21) and, since 
recovery facility's 70A log 72A is not new (decision 
step 857 in FIG. 21) and recovery facility 70D log 
72D is not new according to message 206 FIG. 19 
(decision step 858 in FIG. 21), recovery facility 70A 
successfully matches the name of log 72D sent by 
recovery facility 70D in message 206 FIG. 19 with 
the log name stored in the log 72A2 entry for path 
B (decision step 859 in FIG. 21) and therefore set 
ELST 207A entry for path B to status 2 and notifies 
the local communication facility 57A via message 
204 FIG. 19 to set ELST 208A to status 2 (step 862 
in FIG. 21). Then recovery facility 70A does a 
normal termination of the conversation on path B 
with recovery facility 70D (step 863 in FIG. 21), 
allowing recovery facility 70D to complete normally 
(decision step 883 in FIG. 22 and step 886 in FIG. 
22). Once the communication facility 57A has re- 
ceived message 204 FIG. 19 to post the status for 
path B in ELST 208A (steps 841 in FIG. 20B and 
842 in FIG. 20A), the intercepted and suspended 
conversation 505 on path B is permitted to com- 
plete its initialization (decision step 843 in FIG. 
20B, and steps 845 in FIG. 20B and 846 in FIG. 
20B). This completion removes the suspended sta- 
tus of the conversation and permits it to flow to its 
destination, communication facility 57D. In the tar- 
get communication facility 57D there is a protected 
conversation arrival event (step 832 in FIG. 20A), 
then the search for the path entry in the ELST 
208D (decision step 834 in FIG. 20A) indicates a 
status of 2, permitting the conversation initiation to 
flow normally (step 839 in FIG. 20A) to application 
56D. 

This completes the normal case flow for con- 
versation interception and exchange of log names. 
Some additional cases are also illustrated. Steps 
834 in FIG. 20A and 835 in FIG. 20A illustrate that 
additional conversations on the same path are also 
suspended once the status of 1 has been estab- 
lished to indicate that an exchange of log names 
for the path is already in progress. 

In the case where the target recovery facility 
70D finds a log name mismatch between the log 
name sent in message 202 FIG. 19 and the one 
stored in log 72D2 for path B (decision step 877 in 
FIG. 22), an error is returned in message 206 FIG. 
19 (step 880 in FIG. 22) and ELST 207D is set to 
status 0 for path B and communication facility 57D 
is notified to change its ELST 208D via message 
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203 FIG. 19 similarly (steps 841 in FIG. 20B, 842 in 
FIG. 20B and 881 in FIG. 22). 

In the case where recovery facility 70 D re- 
ceives a message 202 FIG. 19 indicating that the 
source log 72A is new (decision step 872 in FIG. 5 
22) and log 72D is also new (decision step 873 in 
FIG. 22), the new log name for 72A is stored in log 
72D2 for path B (step 878 in FIG. 22) and normal 
completion continues as before (steps 879 in FIG. 
22, 882 in FIG. 22 etc.). w 

In the case where recovery facility 70D re- 
ceives a message 202 FIG. 19 indicating that the 
source log 72A is new (decision step 872 in FIG. 
22), but log 72D is not new (decision step 873 in 
FIG. 22), and it is determined from the sync point 15 
log 72D1 that there is an unresolved sync point 
recovery (outstanding resynchronization) for path 
B, (decision step 874 in FIG. 22), an error message 
is generated for the system SOD operator (step 875 
in FIG. 22), an error is returned to recovery facility 20 
70A in message 206 FIG. 19 (step 880 in FIG. 22), 
ELST 207D is changed to status 0, and the local 
communication facility is notified via message 203 
FIG. 19 to change ELST 208D to status 0 (steps 
881 in FIG. 22, 841 in FIG. 20B, and 842 in FIG. 25 
20B) before return (step 882 in FIG. 22). 

When recovery facility 70A detects an error 
response in message 206 FIG. 19 from recovery 
facility 70D (decision step 853 in FIG. 21) and 
there is an outstanding resynchronization indicated 30 
in log 72A1 (decision step 854 in FIG. 21), then a 
message is sent to the system 50A operator (step 
855 in FIG. 21) and ELSTs 207A and 208A are 
changed to status 0 (step 856 in FIG. 21). ELST 
208A is changed to status 0 via message 204 FIG. 35 
19 to the communication facility 57A (steps 841 in 
FIG. 20B, and 842 in FIG. 20B). This results in an 
error return to the application 56A that originated 
the intercepted conversation, and rejection of the 
conversation (step 844 in FIG. 20B). If no resynch- 40 
ronizations are outstanding (decision step 854 in 
FIG. 21) then the operator message is avoided 
(decision step 854 in FIG. 21). 

When a new log name is returned to recovery 
facility 70A in message 206 FIG. 19 from recovery 45 
facility 70D (decision step 857 in FIG. 21), then it is 
stored in the log 72 A2 entry for path B (step 861 in 
FIG. 21), ELST status of 2 is set for path B (step 
862 in FIG. 21), and the communication facility 57A 
permits the conversation to be released from sus- 50 
pension (steps 841 in FIG. 20B, 842 in FIG. 20B, 
decision step 843 in FIG. 20B, and step 845 in FIG. 
20B). 

When recovery facility 70A detects that the log 
name returned by recovery facility 70D in message 55 
206 FIG. 19 does not match with that stored in log 
72A2 for path B (decision step 858 in FIG. 21 and 
859 in FIG. 21), or a new log name for 72D2 is 
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returned (decision step 858 in FIG. 21) and recov- 
ery facility 70A determines from log 72A1 that 
there are outstanding resychronizations required for 
path B (decision step 860 in FIG. 21), then recov- 
ery facility 70A signals recovery facility 70D that 
there is a serious error by abnormally terminating 
the conversation that supported messages 202 and 
206 FIG. 19 (step 864 in FIG. 21). generates a 
message for the operator of system 50A (step 865 
in FIG. 21), resets the status of ELST 207A, and. 
through message 204 FIG. 19 to communication 
facility 57A, also resets the status of ELST 208A 
(step 866 in FIG. 21). This results in an error return 
to the application 56A that originated the inter- 
cepted conversation, and rejection of the conversa- 
tion (step 844 in FIG. 20B). 

After recovery facility 70D responds to recov- 
ery facility 70A in all cases (step 882 in FIG. 22), it 
can nevertheless detect (decision step 883 in FIG. 
22) errors signalled by recovery facility 70A (step 
864 in FIG. 21) through abnormal conversation 
termination. When this occurs path B entries in 
ELST 207D and, though message 203 FIG. 19 to 
communication facility 57D (and steps 841 in FIG. 
20B and 842 in FIG. 20B), ELST 208D are rest to 0 
status (step 844 in FIG. 22) and the log name entry 
in log 72D2 for path B is erased (step 885 in FIG. 
22), negating previous step 878 in FIG. 22. 

As illustrated in FIG. 19 ELSTs 208A and 
208D, each communication facility controls con- 
versation interception for each path (status other 
than 2), initiation of log name exchange (status 0), 
and normal conversation flow (status 2). The ELSTs 
207A and 207D maintained by each recovery fa- 
cility are similar, but are optional optimizations. 
They permit bypassing messages to the local com- 
munication facility to update the ELST of the com- 
munication facility when the update is not really 
necessary. This is further illustrated below. 

FIG. 19 further illustrates the processing re- 
quired when one of the systems experiences a 
failure. Assume that there is a failure of commu- 
nication facility 57A, recovery facility 70A or the 
communication paths between them. Any such fail- 
ure causes all entries in the exchange log status 
tables 208A, 207A in communication facility 57A 
and recovery facility 70A to be reset to status zero. 
This is essential because any such failure could 
otherwise mask the possibility that there may have 
also been a log failure. Because of this possibility 
all sync point agreements are reset by zeroing the 
status of the exchange log name status table en- 
tries. It should be noted that failure of either ap- 
plication environment 52A or 52D does not cause a 
resetting of the exchange log name status tables 
because the application environments fo not di- 
rectly affect the log name exchange process. This 
is important because application environments are 
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more prone to failure than the system facilities. 
Likewise, failure of one of several application envi- 
ronments sharing a common logical path (not illus- 
trated) does not affect the processing for other 
applications that utilize the path. 5 

Assume further that after the failure of commu- 
nication facility 57A, recovery facility 70A or the 
control paths between them, application 56D ini- 
tiates a conversation along path B to application 
56 A in application environment 52A. This con versa- w 
tion is not intercepted by communication facility 
57D because the exchange log name status table 
within communication facility 57D indicates status 
two for path B; the tables in communication facility 
57D were not reset upon the failure in system 50A. 75 
However, when the conversation proceeds to com- 
munication facility 57A, there is a protected con- 
versation arrival event (step 832 in FIG. 20A), the 
search of the ELST 208A (decision step 833 in FIG. 
20A) indicates status 0, the communication facility 20 
57A intercepts the routing of the conversation 
(steps 836 in FIG. 20A and 837 in FIG. 20A), and 
therefore communication facility 57A requests a log 
name exchange (step 838 in FIG. 20A) by message 
200 FIG. 19 to recovery facility 70A. This causes a 25 
repetition of the previously described log name 
exchange process. When the log name exchange 
is received at recovery facility 70D during the ex- 
change process, the exchange log name status 
table within recovery facility 70D indicates status 30 
two for the path B entry. Therefore, recovery fa- 
cility 70D does not notify communication facility 
57D to change the exchange log name status table 
for path B; such exchange is not necessary. This is 
the only difference in this log name exchange 35 
process from that described above before the fail- 
ure. At the completion of the log name exchange 
process, recovery facility 70A notifies communica- 
tion facility 57A via message 204 FIG. 19 to 
change the status for path B from zero to two.. 40 
Then, the communication facility 57A releases the 
conversation along path B so that it flows to ap- 
plication environment 52A. 

It should be noted that in the foregoing two 
examples, recovery facility 70A initiated the log 45 
name exchange on path B via message 202 FIG. 
19. However, if instead, communication facility 57D 
were the first communication facility to intercept a 
protected conversation, then recovery facility 70D 
would initiate the log name exchange process as 50 
illustrated by message 206 FIG. 19. It should also 
be noted that a single log name exchange is suffi- 
cient to satisfy the pre-sync point agreement for all 
application environments in the same system 50A 
that utilize the same path for protected conversa- 55 
tions. The recording in the common exchange log 
names status table 208A makes this possible. 
Moreover, it should be noted that the single log 



name exchange process described above is suffi- 
cient to satisfy the requirement for pre-sync point 
agreement even when there is more than one ap- 
plication environment in each system 50A and SOD 
involved in the protected conversation because all 
of the application environments within the same 
system share the same log 72. Also, when a pro- 
tected conversation is initiated from application en- 
vironment 52A to application environment 52B in 
the same system 50A, then communication facility 
57A does not intercept the conversation because 
both application environments 52A and 52B share 
the same log 72A and no log name exchange is 
necessary. 

By way of example the architected intersystem 
communication standard can be of a type defined 
by IBM's System Network Architecture LU 6.2 Ref- 
erence: Peer Protocols, SC31-6808 and chapter 5.3 
Presentation Services - Sync Point Verbs, pub- 
lished by IBM Corporation. The exchange of log 
names described in the current section addresses 
the process for executing, controlling, and optimiz- 
ing the exchange, not the architected protocol for 
the exchange. 

Exchange of log names is also required be- 
tween recovery facilities and resource managers of 
protected resources such as shared files or 
databases. Unlike protected conversations, where 
exchange of log names is not necessary when 
conversations take place in the same system (since 
they share a common sync point log), log name 
exchange is necessary for participating resource 
managers, even where resource managers are in 
the same system as the initiating application, be- 
cause resource managers maintain their own sync 
point logs. Unlike protected conversations, which 
may utilize a communication protocol for establish- 
ing protected conversations and log name ex- 
change as described by System Network Architec- 
ture LU 6.2 cited above, pro-tected resources uti- 
lize non-protected conversations and a private 
message protocol for those functions. Also, for 
protected resources, it is not practical in ail cases 
to centrally intercept initial communications to the 
resource manager by using a communication fa- 
cility as the interceptor because the communica- 
tions do not in ail cases proceed through a com- 
munications facility. One example of this is the 
case of a resource manager 63A FIG. 2 that is in 
the same system 50A as the application environ- 
ment 52A and application 56A that uses its re- 
source. This situation does not require conversa- 
tions with the resource to pass through the commu- 
nications facility, but instead supports conversa- 
tions through the conversation manager 53A or 
other local facilities. Another reason is to afford the 
flexibility of supporting resource managers without 
requiring them to entirely change their method of 
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communication with the users of their resource in 
order to conform to the System Network Architec- 
ture LU 6.2 communication protocols. Automatic 
recovery processing from a sync point failure re- 
quires that the names of the various participant's 5 
logs remain the same as they were before the sync 
point began, as was the case for protected con- 
versations described above. 

FIG. 23 illustrates log name exchange for man- 
agers of protected resources. In the illustrated em- ;o 
bodiment, system 50A comprises application envi- 
ronment 52A, associated resource adapter 62A< 
recovery facility 70A, and a common resource re- 
covery log 72A. Although resource managers may 
be local or remote, the illustration is for the local is 
case. As described in more detail below, the pro- 
cess for the remote resource manager case is 
basically the same except that communications fa- 
cilities are involved in completing the inter-system 
communications. Whereas protected conversations, 20 
whether local or remote, always utilize a commu- 
nications facility for communications, providing a 
common intercept point for initiating log name ex- 
change for the pre-sync point agreement, resource 
managers, as illustrated, may bypass the use of a 25 
communication facility in the local case, and do not 
have such a centralized intercept point to initiate 
pre-sync point log name exchange. 

A log name log 800A2 within log 800A is 
associated with resource manager 63A and stores 30 
the name of log 72A of the originating recovery 
facility 70A. Also, a sync point log 800A1 within log 
800A is associated with resource manager 63A and 
stores the state of its protected resource in a sync 
point procedure. As described in more detail be- 35 
low, FIG. 23 illustrates the essential elements re- 
quired to ensure the timely exchange of log names 
between a sync point manager and a participating 
resource manager, as well as the ability to recog- 
nize log name changes brought about by failure 40 
that forces re-initializing one or more of the logs 
72A, 800A. When an application 56A sends a re- 
quest to resource manager 63A via resource adapt- 
er 62A (step 221 of FIG. 26), resource adapter 62A 
calls the sync point manager 60A (step 222) re- 45 
questing: 

1. The log name of the recovery facility's log 
72A, and 

2. The log_name_log resource identifier for 
recovery facility 70A required to establish a con- 50 
versation to the recovery facility 70A for the 
initial exchange of log names by resource man- 
ager 63A. This identifier uniquely identifies re- 
covery facility 70A and also permits recovery 
facility 70A to distinguish incoming log name 55 
exchange conversations from other conversa- 
tions, such as a sync point manager conversa- 
tion that uses a sync_jx>int_log resource iden- 



tifier to connect as described below. 
Sync point manager 60A then establishes a 
conversation to the local recovery facility 70A using 

a sync point log resource identifier (step 223 

FIG. 26). 

A resource identifier is used to identify a re- 
source within a system or more particularly to 
complete a conversation to the manager of a re- 
source in its current execution environment in that 
system. The manager of a resource uses a system 
control program facility to identify a resource to the 
system when the manager of the resource is initial- 
ized. The system control program enforces the 
uniqueness of these resource identifiers. In addition 
to resource manager 63 FIG. 2, other facilities may 
act as resource managers. An example is the re- 
covery facility 70, whose logs are considered re- 
sources for which it has resource identifiers. There 
are four types of resources, each of which is iden- 
tified by a type of resource identifier. The first of 
these is basically generic and can be extended to 
include any resource. The others are defined spe- 
cifically for resource recovery. 

1. object resource, identified by an object re- 
source identifier, which is the set of objects 78 
managed by a resource manager 63. This is the 
case of a generic resource manager and its 
resource, extendible to any resource, including 
sets of data files, queues, storage, or applica- 
tions. This type of resource identifier is used to 
establish a connection to the manager of the 
resource 63 in order to use the resource in 
some way, for example to open a file, start up 
an application, etc. that is owned by that re- 
source manager. 

2. object__recovery resource, identified by an 
object__recovery resource identifier, which is a 
resource manager log 800 and supporting pro- 
cedures for cooperating with a recovery facility 
70 in the recovery from a failed sync point 
procedure. This identifier is used by a recovery 
facility 70 at the time of recovering from a failed 
sync point to establish a conversation with the 
manager of the resource 63 to exchange log 
names and complete the sync point as a part of 
automatic recovery. 

3. sync_point_Jog resource, identified by a 
sync__point_log resource identifier, which is the 
log 72A FIGS. 19 and 23 managed by the 
recovery facility 70A and the set of procedures 
supporting the maintenance of that log 72A. This 
identifier is used by a sync point manager 60 
FIG. 2 to establish a conversation with its recov- 
ery facility 70 in order to provide log information 
on the status of sync points. 

4 - log_name_log resource, identified by a 
log_name_Jog resource identifier, which is the 
log name log 72A2 FIG. 23, managed by the 



61 EP 0 457 



recovery facility 70 A and the set of procedures 
supporting the maintenance of that log 72A2. 
This identifier is used by resource manager 63A 
to establish a conversation with the recovery 
facility 70A to exchange log names with the 5 
appropriate recovery facility 70A. 
After establishing the connection to the recov- 
ery facility 70A, sync point manager 60A obtains 
the recovery information requested by resource 
adapter 62A. This recovery information is returned io 
by sync point manager 60A to resource adapter 
62A (step 224 FIG. 26) and is held by sync point 
manager 60A for release to any other requesting 
resource adapter. Next, resource adapter 62A FIG. 
23 also provides the following sync point recovery 15 
information to sync point manager 60A FIG. 23 
(step 225 FIG. 26): 

1 . An object_jecovery resource identifier which 
can be used by recovery facility 70A FIG. 23 to 
connect to resource manager 63A in the event 20 
of a failure during sync point. This 

object recovery resource identifier permits the 

resource manager to distinguish between incom- 
ing conversations from resource adapter 62A 

and from recovery facility 70A, each of which 25 
requires different programs for processing. By 
giving resource manager 63A, through its re- 
source adapter 62A, the capability of providing 
its own object__recovery resource identifier, 
rather than establishing a standard recovery re- 30 
source identifier for all resource managers, the 
recovery facility 70A avoids conflicts with other, 
resource identifiers employed by this resource 
manager 63A or any other resource manager, 
maintaining a generalized, non-disruptive inter- 35 
face for any resource manager to participate in 
sync point processing. 

2. An object resource identifier which can be 
used by recovery facility 70A when there is a 
sync point failure, to identify resource manager 40 
63A which participates in the sync point and to 

find the log name log 72A2 entry for it This 
identifier uniquely identifies the resource man- 
ager for purposes of managing the sync point, 
logging the sync point in case of a sync point 4s 
failure, and recovering from a failing sync point. 
Following the application's 56A first request for 
use of resource 78A, described above, resource 
adapter 62A initializes a conversation to resource 
manager 63A using its own object resource iden- 50 
tifier, and passes recovery information including the 
log name log resource identifier of recovery fa- 
cility 70 A and the current name of log 72A, ac- 
quired from the sync point manager (step 226 FIG. 
26). 55 

Although FIG. 23 illustrates only one recovery 
facility 70A that is responsible for resource recov- 
ery, a single resource manager may be involved 
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with many recovery facilities since the resource 
may be used by applications in many systems, 
each with its own recovery facility. This is illus- 
trated in FIG. 33 where resource manager 63E is 
used by both application 52A in system 50A and 
application 52D in system 50D, therefore requiring 
log name information from two recovery facilities 
70A and 70D. 

To support recovery of a failed sync point, a 
resource manager 63A requires a log name log 
800A2 FIG. 23 entry for the name of each recovery 
facility log 72, where each such log name repre- 
sents a system 50 that utilizes the resource 
through one or more applications 56 and sync 
point managers 60. The log name log 800A2 FIG. 
23 for the participating resource manager 63A in- 
cludes the following information for each asso- 
ciated recovery facility 70: 

1. A log name log resource identifier which 

identifies each associated recovery facility 70 (in 
the case of FIG. 23, recovery facility 70A); 

2. Recovery facility's 70 log name (in the case 
of FIG. 23, the name of log 72A); 

3. An exchange_done flag which indicates 
when a log name has been successfully ex- 
changed. Although the exchange__done flag is 
logically a part of the log name log 800A2 FIG. 
23, it need not be written to non-volatile storage 
because it is logically reset for each initiation of 
the resource manager with which it is asso- 
ciated. The purpose of the flag is to avoid the 
exchange of log names for a particular recovery 
facility 70A except for the first conversation from 
the resource adapter 62A that is operating in the 
system 50A of the recovery facility 70A. There 
may be many application environments in a 
system, all serviced by the same recovery fa- 
cility and each with a resource adapter with a 
conversation to the same or different resource 
manager. It is only necessary for a resource 
manager to initiate an exchange of log names 
upon the first instance of a conversation with 
one of the resource adapters that are associated 
with the same recovery facility. The 
exchange done flag is set to prevent subse- 
quent exchanges. 

The remainder of FIG. 26 illustrates an al- 
gorithm executed by resource manager 63A FIG. 
23 to determine when to initiate a log name ex- 
change. Upon first receipt of the object resource 
identifier (step 226), resource manager 63A search- 
es log name log 800A2 to determine if it has an 
entry for recovery facility 70A identified by the 
log_name_Jog resource identifier that was includ- 
ed in the recovery information passed from re- 
source adapter 62A FIG. 23 (step 230 FIG. 26). 

The resource manager uses the log_name log 

resource identifier received from the resource 
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adapter to search the log name log 800A2 FIG. 23. 
If there is no entry, then resource manager 63A 
initiates the log name exchange (step 232 FIG. 26): 
If an entry is found in step 230 for recovery facility 
70A FIG. 23, then resource manager 63A deter- 5 
mines if the exchange_done flag is set (step 234 
FIG. 26). The exchange_done flag is set when a 
successful log name exchange occurs, and re- 
mains set until the resource manager terminates 
abnormally or is shut down normally. If a resource w 
manager is unable to exchange log names due to a 
failure to initiate a conversation with the recovery 
facility, the resource manager terminates the con- 
versation initiated by its resource adapter. If the 

exchange done flag is not set, then resource 15 

manager 63A FIG. 23 initiates the log name ex- 
change in step 232 FIG. 26. However, if the 
exchange_done flag is set, resource manager 63A 
FIG. 23 then compares the log name transmitted 
by resource adapter 62A to the log name in the 20 
entry (step 236 FIG. 26). If these two log names 
are the same, then resource manager does not 
initiate the log name exchange (step 242 FIG. 26), 
but if they are different, resource manager 63A 
FIG. 23 initiates the log name exchange in step 25 
232 FIG. 26. The foregoing algorithm assures a log 
name exchange for any recovery facility the first 
time that a resource manager communicates with a 
resource adapter associated with that recovery fa- 
cility. Also, the algorithm assures a subsequent log 30 
name exchange whenever the log names for the 
recovery facility 70A FIG. 23 change. In the latter 
case, the log name exchange is necessary, even 
though the resource manager 63A gets the new 
recovery facility log 72A name from the resource 35 
adapter, since it is necessary to provide the log 
name of the resource manager's log 800 A to the 
recovery facility 70A, whose log name log must be 
synchronized with that of resource manager 63A. 

The log name exchange of step 232 FIG. 26 40 
between resource manager 63A FIG. 23 and recov- 
ery facility 70A is further illustrated in FIG. 27, and 
comprises the following steps (assume that log 72A 
is the log): 

1 . Step 243 of FIG. 27: Resource manager 63A 45 
FIG. 23 initiates a conversation 250 to recovery 
facility 70A using a log__name_log resource 
identifier obtained from resource adapter 62A; 

2. Step 243 of FIG. 27: Resource manager 63A 

FIG. 23 transmits the object resource identifier 50 
that uniquely identifies resource manager 63A to 
recovery facility 70A; 

3. Step 244 of FIG. 27: Resource manager 63A 
FIG. 23 transmits the log name for log 800A to 
recovery facility 70A; 55 

4. Step 245 of FIG. 27: Recovery facility 70A 
FIG. 23 updates log name log 72A2 with the log 
name of resource manager 800A; 



5. Step 246 of FIG. 27: Recovery facility 70A 
FIG. 23 returns a response to resource manager 
63A providing the log name of log 72A; 

6. Step 247 of FIG. 27: Resource manager 63A 
FIG. 23 updates log name log 800A2 with the 
name of log 72A; 

7. Step 248 of FIG. 27: Resource manager 63A 
FIG. 23 sets the exchange_done flag in log 
name log 800A2; 

When application 56A FIG. 23 requests a sync 
point from sync point manager 60A, sync point 
manager 60A sends the above object_recovery 
resource identifier and object resource identifier to 
recovery facility 70A where it is stored in sync 
point log 72A1 along with the information describ- 
ing the state in the sync point process. If a failure 
occurs during a sync point, recovery facility 70A is 
activated to perform the operations necessary to 
complete the sync point procedure. If resources 
were participating in the failing sync point, recovery 
information in the associated recovery facility's 
sync point log entry is available to permit contact 
with those resources in order to accomplish recov- 
ery. For example, if application 56A goes down 
during a two-phase commit operation, then recov- 
ery facility 70A is activated and subsequently ex- 
changes log names with resource manager 63A. 
When this second exchange indicates that log 
names have not changed since the sync point was 
initiated, recovery facility 70A knows that it can 
continue with the recovery of the sync point. A log 
name mismatch in the exchange would indicate 
that log information required for automatic recovery 
has been lost and therefore automatic recovery 
should not be attempted. The recovery facility 70A 
initiates the second log name exchange and asks 
resource manager 63A what state or phase it was 
in prior to the failure. Even though the initial ex- 
change of log names was initiated by resource 
manager 63A, as described above, the exchange of 
log names required after the failure is initiated by 
recovery facility 70A as follows: 

1 . For each resource for which there is recovery 
information in sync point log 72A1 associated 
with the failing sync point, recovery facility 70A 
identifies the log name log entry for the re- 
source by using the object resource identifier 
found in the sync point log 72A1 entry as a 
search argument applied to log name log 72A2 
entries, yielding the resource's log name. This is 
illustrated in FIG. 25. 

2. The recovery facility establishes a conversa- 
tion 252 FIG. 23 to resource manager 63A using 
the object_jecovery resource identifier found in 
the sync point log entry. 

3. Recovery facility 70A sends its own log 

name, the log name log resource identifier 

(unique identifier of recovery facility 70A), and 
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the resource's log name to resource manager 
63A using conversation 252. 
In response, resource manager 63A performs 
the following steps: 

1. Resource manager 63A recognizes that the 
conversation from recovery facility 70A is in- 
tended for the purpose of sync point recovery 
because the conversation includes the 
object^ recovery resource identifier. 

2. Resource manager 63A uses the 
log_name_Jog resource identifier sent by re- 
covery facility 70 A to verify the entry in log 
name log 800A2 that is associated with recovery 
facility 70A. 

3. Resource manager 63A verifies that the log 
name of the resource transmitted by recovery 
facility 70 A corresponds with the log name of its 
own log 800A. 

4. Resource manager 63A returns an error sig- 
nal to recovery facility 70A on conversation 252 
if it finds no entry in log name log 800A2 asso- 
ciated with recovery facility 70A. 

• 5. Resource manager 63A sends an error signal 
to recovery facility 70A on conversation 252 if 
either of the verification steps described above 
fails. 

An error condition detected in the exchange of 
log names at the beginning of recovery prevents 
the continuation of the automatic sync point failure 
recovery procedure of recovery facility 70A. Such 
an error condition indicates that a failure of one or 
more of the participating logs occurred concur- 
rently with the sync point failure. The loss of a log. 
implies the loss of all information in the log and the 
assignment of a new log name. Such failure re- 
quires manual intervention and heuristic decisions 
to resolve the failing sync point. Detection of such 
an error condition is the main purpose of the log 
name exchange process implemented after sync 
point failure. 

Similar to the case of the local resource man- 
ager 63A illustrated in FIG. 23, FIG. 24 illustrates 
log name exchange where the resource manager 
63E of system 50D is remote from application 
environment 52A and the application 56A of sys- 
tem 50A that uses the resource managed by re- 
source manager 63E. Communications between re- 
mote resource manager 63E and local application 
56A and recovery facility 70A are made via inter- 
system communications facilities 57A and 57D, 
rather than through intra-system communications 
support provided by the system control program. 
Sync point manager 60A uses recovery facility 70A 
to manage the sync point and log name logs 72A 
required for recovery from a failing sync point. 
Resource manager 63E maintains its own resource 
manager logs 800E. The communications path uti- 
lized at the time of pre-sync point agreements and 



at the time of ^synchronization for recovery of 
failing sync points is between resource manager 
63E and recovery facility 70A of system 50A. The 
recovery facility 70 D (not shown) of system 50 D is 

5 not utilized in this case since the originating sync 
point manager, application and associated recovery 
facility are not local to system SOD, but are remote 
in system 50A. The only difference between the 
log name exchange process for local and remote 

70 resource managers is that communications be- 
tween a remote resource manager 63E and re- 
source adapter 62A and recovery facility 70A are 
made via communications facilities 57A and 57D 
instead of through intra-system communications 

15 services of the local system control program. Oth- 
erwise the exchange of log names process is the 
same as described above with reference to FIG. 
23. The communications facilities 57A and 57D do 
not play a role in determining when to exchange 
20 log names with a remote log, i.e. the communica- 
tions facilities do not intercept conversations as 
was the case for protected conversations in FIG. 
19. 

25 RECOVERY FACILITY FOR INCOMPLETE SYNC 
POINTS FOR DISTRIBUTED APPLICATION 

Recovery Facility 70A illustrated in FIG. 2 is 
used to complete a sync point that encounters a 

30 failure. In most cases the recovery 
(^synchronization) is accomplished automatically 
by a Recovery Facility 70A, which recognizes the 
failure and then acts as a surrogate for the local 
sync point manager 60A to complete the sync 

35 point normally through alternate or reacquired com- 
munications to participants in the sync point. Fail- 
ures include a failing sync point manager 60A, a 
failure in communications between a sync point 
manager 60A and its recovery facility 70A, failure 

40 of communications with or failure of an application 
partner 56D or resource manager 63, and failure of 
the recovery facility 70A. 

By way of example the architected intersystem 
communication standard can be of a type defined 

45 by IBM's System Network Architecture LU 6.2. 
Reference: Peer Protocols SC31-6808 and chapter 
5.3 Presentation Services - Sync Point verbs pub- 
lished by IBM Corporation. 

Recovery facility 70A serves all of the applica- 

50 tion execution environments 52A, B, C and partici- 
pating sync point applications within system 50A 
and utilizes common recovery facility log 72A for 
the purpose of sync point recovery. Typically, there 
are many systems interconnected with each other 

55 by communication facilities 57 and therefore, many 
recovery facilities 70 can be involved in recovery 
processing. 

FIG. 33 illustrates various recovery situations 
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involving system 50A, 50D and 50F. Each applica- 
tion execution environment 52A, B, D, F, and G 
executes an application 56A, B, D, F, and G re? 
spectively (not illustrated) which utilizes a sync 
point manager 60A, B, D, F, and G respectively 
(not illustrated) for the purposes of coordinating 
resource recovery. Each sync point manager uses 
the recovery facility in its system to manage the 
sync point and log name logs required for recovery 
from a failing sync point For example, the sync 
point managers in application environments 52A 
and 52B use the recovery facility 70 A to record 
sync point recovery information in recovery facility 
log 72A. Resource managers 63A, B, D f E, F, and 
G maintain their own sync point and log name logs 
800A, B, D, E, F, and G respectively. In the illus- 
trated examples, scopes of sync points are in- 
dicated by solid lines with arrows. Although sync 
points may be initiated by an y participant and the 
scope of a sync point is dynamic, the illustration is 
static for simplicity of illustration. For the illustrated 
static cases, sync points flow between application 
environments 52B to 52D to 52F via the associated 
sync point managers and protected conversation 
adapters (not shown) via communication solid lines 
801 and 802; and from application environments 
52A, B, D, F and G via the associated sync point 
managers and resource adapters to the resource 
managers 63A, B, D, E, F and G via communica- 
tion solid lines 803A-1 , 803A-2 803B, 803D, 803E, 
803F and 803G, respectively. 

Three sync point scopes are included in the 
FIG. 33 illustration. The first involves a single ap- 
plication environment 52A including sync point 
manager 60A and utilizes two resource managers 
63A and 63E. The second sync point scope in- 
volves three application environments 52B, 52D 
and 52F, each involving various participating re- 
source managers 63B for 52B ( 63D, E for 52D, and 
63F, G for 52F, as further illustrated by a sync 
point tree illustrated in FIG. 34. The third sync 
point scope involves application environment 52G 
and a resource manager 63G. 

The dotted lines in FIG. 33 show communica- 
tions paths employed at the time of pre-sync point 
agreements and at the time of pre-sync point 
agreements and at the time of resynchronization 
for recovering a failing sync point (refer to the 
section "Log Name Exchange For Recovery of 
Protected Resources" below). For resource man- 
agers, the pre-sync point and resynchronization 
path is between the resource manager and the. 
recovery facility of the system of the originating 
application environment (i.e., user, for example up- 
dater, of the resource managed by the resource 
manager), for examples, between resource man- 
ager 63E and recovery facility 70A via path 804A-2 
when application environment 52A is the originator 



(user of the resource managed by resource man- 
ager 63E), and between resource manager 63E 
and recovery facility 70D via path 804D when ap- 
plication environment 52D is the originator. 
5 A sync point propagates through participants of 

the sync point in a cascaded manner forming the 
sync point tree illustrated in FIG. 34. Applications 
56B, 56D and 56F communicate with each other 
via protected conversations 801 and 802 managed 
10 by protected conversation adapters 64B, D and F 
(not shown), respectively. Applications 56B, 56D 
and 56F utilize resource adapters 62B, D and F 
(not shown), respectively which use non-protected 
conversations 803B, 803D, 803E, 803G, and 803F 
75 to communicate with the resource managers 63B, 
D, E, G and F, respectively. This tree includes the 
sync point initiator application 56B whose partici- 
pants are a resource manager 63B and a distrib- 
uted application 56D, which in turn has participants 
20 resource managers 63E, 63D and distributed ap- 
plication 56F, which in turn has participant resource 
managers 63G and 63F. 

For purposes of sync point recovery, a sync 
point log, 72D for example, is maintained by sync 
25 point manager 60D (through recovery facility 70D 
not shown) with information about its immediate 
predecessor in the sync point tree, application 56B 
in environment 52B, and the immediate participants 
known to it, resource managers 63E, 63D and 
30 application 56F in application environment 52F, but 
maintains nothing in its sync point log 72D con- 
cerning any of the other sync point participants 
63B, 63G or 63F. 

FIG. 35 is a high level flowchart 298 of the 
35 principal elements for sync point recovery. It repre- 
sents the two parts of a recovery facility 70; pre- 
sync point recovery agreement (Steps 299, 300, 
301 and 302) and recovery from sync point failure 
(Steps 303-306). 
40 Prior to a sync point occurrence there must be 

agreement between the participants in the sync 
point concerning the identity of the logs associated 
with the sync point and the current level of their 
respective logs 72. (Refer to the foregoing section 
45 entitled "Log Name Exchange For Recovery of 
Protected Resources"). This pre-sync point recov- 
ery agreement is important in case of a sync point 
failure to ensure that the logs used to recover from 
the sync point failure are the same ones and are at 
so the same level as they were before the sync point 
was initiated. If, between the time of the pre-sync 
point recovery agreement (exchange of log names 
described above) and the occurrence of a sync 
point failure, one or more of the participants has a 
55 log failure and must begin with a new log, then the 
automatic recovery procedures associated with the 
failing log will fail. 

The exchange of log names between the sync 
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point participants and the recording of log names in 
the logs 72 make this information available for 
validation in the case of a sync point failure. These 
exchanges are initiated upon the detection of the 
first establishment of communications over a par- 5 
ticular path. Because communications can be ini- 
tiated locally or remotely, the recovery facility 70 
supports both local detection (Steps 299 and 300) 
requiring an outgoing log name exchange and re- 
mote detection (Steps 301, 302) requiring an in- w 
coming log name exchange. 

The recovery facility 70 provides automatic re- 
covery from sync point failure and includes Step 
303 - the various events that may occur to initiate 
the recovery procedure, Step 304 - the initialization 15 
of the recovery procedure, Step 305 - the actual 
recovery, referred to as a recovery driver process, 
and Step 306 - the termination of the recovery 
procedure. The recovery facility 70 includes asyn- 
chronous handling of multiple sync point failure 20 
events. 

FIG. 36 shows more detail for the "Recovery 
From Syncpoint Failure" portion of the recovery 
procedure (Steps 303-306). Five types of events 
(Step 303) initiate the recovery procedure: 25 

(1) A sync point request event 311 occurs as a 
result of receiving a request from a sync point 
manager 60 when it encounters a communica- 
tions failure with one or more of its sync point 
participants (ex. resource managers 63). The 30 
sync point manager 60 initiates the recovery 
procedure explicitly by sending a request to the 
recovery facility 70 using the same path that is 
used for logging the sync point activity. The 
request includes a description of the failing 35 
participant(s) using the corresponding sync point 
identifier(s). An event occurs for each sync point 
identifier that is specified. 

(2) A recovery request event 312 occurs at a 
target recovery facility 70 (one that represents a 40 
participant in a failing sync point) when a recov- 
ery process that represents a sync point initiator 
sends a recovery request to one of its partici- 
pants. 

(3) A communications failure event 313 occurs 45 
in a recovery facility 70 when there is a broken 
connection on the path used to send log in- 
formation from the application environment to 

that recovery facility. An event occurs for each 
sync point that is in progress for the application 50 
environment that was utilizing the failed path. 

(4) A recovery facility failure event 314 occurs 
when there is a termination failure for a recovery 
facility such that sync point logging cannot take 
place. An event occurs for each incomplete 55 
sync point at the time of the failure and the 
events occur when the recovery facility is re- 
started. 
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(5) A recovery administrative request event 315 
results from an administrative command that is 
used to repair sync point failures that have en- 
countered prolonged delays or serious failures 
during the normal, automatic recovery proce- 
dure. The request manually supplies response 
state information that is normally available 
through automatic recovery protocols. The ap- 
propriate response state information is deter- 
mined off-line from manual investigation of sync 
point log records. The. appropriate response 
data (state information) is determined by admin- 
istrators from manual investigation of sync point 
log records. 

When the recovery procedure is initiated, Step 
304 starts an asynchronous sub-process for each 
recovery event received. A participation driver sub- 
process (Step 317) initiates communications and 
accepts responses from each downstream partici- 
pant in the failing sync point for the purpose of 
agreeing upon a consistent resolution. This com- 
munication involves the participation driver sending 
a message that includes the recovery server log 
name and a sync point state such as commit or 
back out, and then receiving a response from the 
participant that includes an indication of agreement 
or disagreement with the recovery server log name 
sent, a participant log name, and a response to the 
sync point state, such as committed or backed out. 
The participation driver invokes a response pro- 
cessing driver (Step 318) for each response mes- 
sage thus received. The response processing driv- 
er analyzes the response and completes all re- 
quired actions and recording. This involves check- 
ing the participant's log name against the one 
recorded for the participant in log 72 to verify that 
the participant has not had a log failure since the 
sync point began. It further involves posting the 
sync point response to the recovery facility log 72. 
Then the response processing driver returns to the 
participation driver. When all responses are re- 
ceived and processed, an initiator response driver 
(Step 319) is invoked to build and send a response 
to the recovery facility that represents the initiator 
of the sync point, permitting it, in turn, to resolve 
the sync point with its initiator, if applicable. The 
response to the initiator is similar to the response 
that the current recovery facility received from its 
participants, involving a return of the current recov- 
ery facility log name and the response sync point 
state, such as committed or back out, that is based 
on the results from all of its own sync point partici- 
pants. Finally, a recovery terminator (Step 306) 
terminates all involved processes. 

FIG. 37 illustrates control structures required 
for the recovery procedure. A recovery control 
structure 340 contains information about a particu- 
lar recovery event and exists throughout the cur- 
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rent processing of the event. It contains information 
that is common to the recovery of all participants 
for the related sync point. It also contains anchors 
to an associated entry 342 in log 72 and to a chain 
of participant control structures 344, each of which 5 
contains the current recovery status and path iden- 
tifier for the recovery participant. The sync point 
log entry 342 has header information 348 that is 
common to the local sync point participants as well 
as body information 350 about the immediate in- 70 
itiator and each immediate participant. Finally there 
is a log name log entry 354 which contains initial 
log name exchange information for each sync point 
path known to the recovery facility that is asso- 
ciated with the sync point log. 75 

The purposes of these fields is further indi- 
cated by the structural flows that follow. Some 
fields require preliminary description: "Chain" 
fields are used to interconnect structures of like 
type. 20 

"State" fields: 

SPL_SYNCPOINT_STATE is the overall sync 
point state. Once the sync point has reached phase 25 
two, this state permits driving downstream partici- 
pants to resolve the sync point. If the sync point 
was in phase one at the time of failure, recovery, 
request event processing may change this state 
according to the direction provided by the initiator 30 
recovery facility. 

SPL_PARTICIPANT_STATE is updated with 
response states from participants by the Response 
Processing Driver 318. 

RCS_PARTICIPANTS_ STATE is set by the 35 
various recovery event processing for the purpose 
of driving the affected downstream sync point par- 
ticipants. 

RCS_JNITIATOR_RESPONSE_STATE is ini- 
tialized by various recovery events processing 311- 40 
315 along with RCS_PARTICIPANTS_STATE, but 
under some circumstances is also updated by the 
response processing driver 318 where the re- 
sponse to the initiator is to reflect unusual and 
unexpected responses from participants that result 45 
from unilateral decisions known as heuristic re- 
sponses. This field is used by the initiator response 
driver 319 to provide the state returned to the 
initiator. 

50 

"Path ID" fields: 

RCS_PATH_lD is the path associated with an 
incoming event and may be used to respond to the 
originator of that event. 55 

PCS_PATH_ID is the path associated with a 
participant in a failed sync point. It would be the 
same as the SPL RECOVERY PATH ID for 



participants. 

SPL_RECOVERY__PATH_!D is the path to 
get to the participant or the initiator as needed by 
the sync point recovery facility. 

SPL_SYNCPOINT__PATHJD is the path 
used by sync point processing in the application 
environment to supply sync point log information to 
the local recovery facility's sync point log. 

"Flags": 

RCS_RESPOND_TO_JNITIATOR indicates 
that a response should be generated to the imme- 
diate initiator of the sync point recovery facility: 

RCS RETURN TO CALLER - is used for 

controlling synchronous return from a sync point 
recovery request when the wait indicator 
(described below) is used; 

RCS_ERASE_LOG is used to record that a 
recovery administrative request included a PURGE 
option, causing the sync point log entry to be 
erased at the conclusion of processing; and 

SPL_lNITIATOR indicates that the information 
in the particular sub-entry of the BODY of the sync 
point log entry concerns the initiator of the sync 
point; otherwise it concerns a participant. 

"Miscellaneous" Fields: 

RCS_FUNCTION_ID is used by the sub-pro- 
cess starter service to determine the function to be 
invoked to execute in the new process. 

SPL_SYNCPOINT_JD is the unique identifier 
of the sync point and the node in the sync point 
tree. Each sync point log entry has a distinct sync 
point identifier. 

SPL_SUSPENDED_PROCESS_JD is set by 
the timer wait service to identify the suspended 
process and reset when the timed wait interval 
expires. It is used to invoke the resume service to 
prematurely terminate the timed wait for a particu- 
lar process. 

PCS_STATUS is used to record the status of 
communications with each participant in the recov- 
ery procedure. It has four possible values: RE- 
START, CONNECTED, RETRY, AND RESPOND- 
ED. 

LL_LOGNAME is the log name of the sync 
point participant. One is recorded for each path 
involved in any potential sync point communication. 

FIG. 38 is a flowchart which illustrates the 
processing Step 300. triggered by event step 299 
(corresponds to same step in FIG. 35) and ex- 
ecuted by recovery facility 70 when a sync point 
communication is initiated for the first time during 
the activation of the recovery facility. It initiates a 
process (Step 359) for exchanging log names be- 
tween the local recovery facility and the recovery 
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facility associated with the target of the sync point 
communication. 

A receive service (Step 361) provides the input 
data (path identifier) for the process. The log name 
log is used (step 362) to retrieve the log name 
associated with the path for use in the exchange of 
log names (Step 362). In the log name exchange, 
the expected log name for the target is sent along 
with the log name of the local recovery facility. The 
log name exchange request is sent (Steps 363- 
365) and the response is processed (Step 366). 
When the exchange is successful, the log name 
log is updated with the new or changed target log 
name. Then the recovery facility disconnects from 
the path (Step 367) and invokes a first communica- 
tion service to record that the exchange was suc- 
cessful to prevent future exchange events for the 
processed path, or unsuccessful to insure contin- 
ued suspension of communications and attempts to 
complete an exchange of log names (Step 368). 

FIG. 39 is a flowchart which illustrates in detail 
the Steps 302, triggered by event step 301 
(corresponds to same step in FIG. 35), that take 
place as a result of an incoming log name ex- 
change request arrival. After an initiation (Step 
370), the log name and path identifier are received 
(Step 371) and the log name log is updated ac- 
cordingly (Steps 371-373), If there are any recov- 
ery processes associated with the path that are in 
suspension (timer-wait) (Step 374), then the recov- 
ery facility 70 invokes the resume service for each 
to cause resumption of the processes. The log 
name exchange response (Step 374A) includes the 
local log name and an indication of 
agreement/disagreement with the exchange data 
received. The response is sent to the originator 
(Step 375) and, for successful exchange, the first 
communications service is invoked (Step 376) to 
prevent subsequent exchange of log names for the 
path. 

FIG. 40 is a flowchart which illustrates the 
procedure for an explicit request event (Step 311 
corresponds to same step in FIG. 35) from an 
active sync point to perform sync point recovery. 
This would occur if there were a partial failure in an 
application environment 52 requiring recovery from 
a sync point but not terminating the application or 
sync point. The request from the sync point man- 
ager in the application environment 52 provides the 
sync point identifier and the direction (commit or 
back-out) to be used to complete the failing sync 
point. Additionally, for each failed participant in the 
sync point, the recovery path identifier is supplied. 
The required action can complete synchronously 
(wait indicator supplied) or asynchronously as de- 
scribed in more detail below (no wait indicator 
supplied). 

The arrival of this request is an event that 



initiates (Step 379) a procedure (Step 380) which 
requires searching the sync point log (Step 381) for 
an entry that has a matching 
SPL_SYNCPOINT_lD. When found, a recovery 
5 control structure is built (Step 382) with an anchor 
to the sync point log entry and 
RCS_PARTICIPANTS_STATE set to the direction 
passed in the request. Additionally, the 
RCS_RESPOND_TO_JNITlATOR flag setting 
to prevents sending a response to a recovery facility 
representing the initiator of the sync point and, in 
the case where the wait indicator is passed, the 

RCS RETURN TO CALLER flag is set, causing 

the response to the request to be deferred until the 
15 recovery procedure is completed. Without the wait 
indicator, there is a response to the initiating re- 
quest after the recovery procedure is started. Next, 
an agent control structure is built (Step 383) for 
each participant, represented by the path identifiers 
20 provided, and PCS_STATUS is initialized to RE- 
START. The chain of agent control structures is 
anchored to the recovery control structure. Next, 
recovery initialization is invoked (Step 384), pass- 
ing the recovery control structure. When returning 
25 from the initialization, there is a response to the 
invoker (Step 385). When the wait indicator was 
used, the invoker is advised of completion; other- 
wise, the notification is either completion or an 
indication that the request processing was begun 
30 (will complete later). 

FIG. 41 is a flowchart illustrating the procedure 
that results from an event initiated (Step 312) by 
receiving a recovery request from a recovery fa- 
cility that represents the immediate initiator in a 
35 failing sync point This initiates (Step 388) a proce- 
dure (Step 390) which invokes the receive service 
(Step 391) to obtain the path ID associated with the 
incoming request, the sync point identifier for the 
failing sync point (which also identifies the local 
40 node in that sync point), the log name associated 
with the originator's sync point log, the log name 
that the initiator's recovery facility expects to match 
with the name of the sync point log for the current 
recovery facility, and the direction (state) to be 
45 used to resolve the failure. 

The path identifier is used to find an entry in 
the local log name log (Step 392). Then 

LL LOG NAME is verified with the originator's log 

name and the local sync point log name is verified 
so with the expected log name passed (Step 393). 
Next, the sync point log is searched for an entry 
with the matching sync point identifier (Step 394). 
When found, a recovery control structure is built 
(Step 395) with an anchor to the sync point log 

55 entry and RCS PARTICIPANT STATE set to the 

direction passed in the request. Additionally, the 
RCS_RESPOND_JO_INITIATOR flag is set to 
indicate that a response to the initiator is appro- 
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priate and the RCS__PATH_ID is set to the path 
identifier of the initiator's incoming path. The 
RCS RETURN TO CALLER flag is set to pre- 
vent return to the calling sync point manager 60 in 
the application environment 52. Finally recovery 
initialization is invoked (Step 396), passing the re- 
covery control structure. 

FIG. 42 is a flowchart illustrating the processing 
(Step 400) that results when there is a failure in the 
path (Step 313) between the application environ- 
ment 52 and the recovery facility 70 such that sync 
point logging is inoperative. After the process is 
initiated (Step 399), the sync point log is searched 
for entries that satisfy both of the following con- 
ditions (Step 401): 

(1) SPL_SYNCPOINT_PATH_lD matches the 
failing path. 

(2) SPL_SYNCPOINT_STATE indicates that 
the immediate sync point participants can be 
driven to complete the sync point. This is in- 
dicated by one of the following: 
SPL_SYNCPOINT_STATE indicates sync 
point phase one and there has not been a 
response to the initiator's "prepare", or 
SPL_SYNCPOINT_STATE indicates sync 
point phase two. 

Where these conditions are met, a recovery 
control structure is built (for each such log entry) 
(Step 402) with an anchor to the sync point log 
entry, where both 

RCS_INITIATOR_RESPONSE_STATE and 
RCS_PARTICIPANTS_STATE are derived from 
the SPL__SYNCPOINT_STATE. In some cases, 
SPL PARTICIPANT STATE also affects the set- 
ting of the 
RCS_JNITIATOR_RESPONSE_STATE setting. 
This occurs, for example, when a response from a 
participant had indicated a unilateral (heuristic) ac- 
tion. Additionally, the 
RCS_RESPOND_TO_JNITIATOR flag setting 
prevents sending a response to a recovery facility 
representing the initiator of the sync point and the 
RCS RETURN TO CALLER flag setting indi- 
cates that there is no calling sync point manager to 
which to return. The resulting recovery control 
structures are chained together. Finally, recovery 
initialization is invoked (Step 403), passing the 
chain of recovery control structures. 

FIG. 43 is a flowchart which which illustrates 
processing (Step 408) that results when there is a. 
failure of the recovery facility 72 (Step 314). When 
the recovery facility 72 (Step 314). When the re- 
covery facility 70 is restarted (Step 407), the log 72 
is searched (Step 411) for all entries that satisfy 
the following condition: 

SPL_SYNCPOINT_STATE indicates that the 
immediate sync point participants can be driven to 
complete the sync point. This is indicated by one 



of the following: SPL__SYNCPOINT_STATE indi- 
cates sync point phase one and there has not been 
a response to the initiator's "prepare", or 
SPL_SYNCPOINT_STATE indicates sync point 
5 phase two. 

Where this condition is met, a recovery control 
structure is built for each such log entry (Step 412) 
with an anchor to the sync point log entry, where 
both RCS_INITIATOR_RESPONSE_STATE and 
70 RCS_PARTICIPANTS_STATE are derived from 
the SPL_SYNCPOINT_STATE. In some cases, 
SPL_PARTICIPANT_STATE also affects the set- 
ting of the 
RCS_INITIATOR_RESPONSE_STATE setting. 
75 This occurs when a response from a participant 
had indicated, for example, a unilateral (heuristic) 
action. Additionally, the 
RCS_RESPONDjrojNIT!ATOR flag setting al- 
lows for sending a notification to the recovery fa- 
20 cility representing the initiator of the sync point and 
the RCS RETURN TO CALLER flag setting in- 
dicates that there is no calling process to which to 
return. The resulting recovery control structures are 
chained together. Finally recovery initialization is 
25 invoked (Step 413), passing the chain of recovery 
control structures. 

FIG. 44 is a flowchart which illustrates a sup- 
port (Step 409) for recovery administrative requests 
(Step 315) which permits manually initiated repair 
30 of stalled automatic sync point recovery due to 
failure to initiate a conversation with a sync point 
participant (participant case) for downstream reso- 
lution or a sync point initiator (initiator case) for 
providing the direction (state) to drive its partici- 
35 pants to completion. 

In the participant case, the request provides a 
substitution for the participant's response so that 
the recovery facility 70 that is driving the down- 
stream participants can complete the recovery 
40 without actually communicating with the participant. 
In the initiator case, the request provides a sub- 
stitution for the normal recovery initiated recovery 
request event (as described in FIG. 41) that cannot 
occur due to the inability of the initiator to connect 
45 to the local recovery facility 70 to drive its partici- 
pants without the event depicted in FIG. 41 . 

In the initiator case, after the support is initiated 
(Step 408), a recovery control structure is built 
(Step 414), setting the 

50 RCS_JNITIATOR_RESPONSE_STATE and 
RCS_PARTICIPANTS_STATE to the direction 
passed, providing the equivalent of a recovery ini- 
tiated recovery request. In addition. 
RCS_RESPOND_TO_JN1TIATOR is set off to 
55 prevent response generation and 
RCS_R ETU RN___TO CALLER is set off to pre- 
vent return from recovery initialization when pro- 
cessing is complete. Recovery initialization is in- 
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voked (Step 415) to initiate the processing. 

In the participant case, a recovery control 
structure and a suspended recovery process 
should already exist. The process is suspended 
while in timer-wait, retrying the initialization of a 
conversation to the participant at the end of each 
time interval. After verifying this (Step 416), the 
PCS for the participant associated with the passed 
recovery path identifier is located and the 
PCS_STATUS is set (Step 417) to RESPONDED, 
as if the participant had actually responded, and 
the SPL_PARTICIPANT_STATE is set to the di- 
rection passed; then the sync point log entry is 
updated. Next, the 

SPL_SUSPENDED_PROCESS_JD is used to call 
the resume service to restart the suspended pro- 
cess (Step 418). In either case, there is a response 
made to the originating request (Step 419), indicat- 
ing that the proper substitutions have been made 
and the recovery process is active again. If the 
purge option is passed, RCS_ERASE__LOG is 
turned on to erase the sync point log entry at the 
conclusion of processing. 

FIG. 45 is a flowchart which illustrates the 
Steps required for the recovery initialization func- 
tion (Step 304). After initialization (Step 303) the 

RCS RETURN TO CALLER flag determines 

(Step 421) whether the participation driver is in- 
voked in the current process (ON) or in a separate, 
parallel process (OFF). Where 

RCS RETURN TO CALLER is set, the partici- 
pation driver is invoked (Step 422), passing the 
recovery control structure. Otherwise, the 
RCS_FUNCTIONJD is set to indicate the 
"participation driver" and the sub-process starter 
service is invoked for each recovery control struc- 
ture passed (Step 423). 

FIG. 46 is a flowchart which illustrates the flow 
for the participation driver Step 317. The primary 
function of the participation driver is to initiate 
communications with the participants of the failing 
sync point and obtain responses from them in 
order to insure that the associated sync point logs 
are at the same level as they were when the sync 
point began and provide sync point state informa- 
tion that will provide the basis for resolving the 
sync point. 

After initiation of the participation driver (Step 

430) , the SPL_SYNCPOINT_STATE is set (Step 

431) according to the current 
RCS_PARTICIPANTS___STATE. If participation 
control structures have not already been built for 
the sync point participants, they are built at this 
time, chained together, and anchored to the current 
recovery control structure. PCS__PATH_JD comes 
from the SPL_RECOVERY_PATHJD of each 
participant and the PCS_STATUS is initialized to 
RESTART, unless SPL__PARTICIPANT STATE 



indicates that sync point is resolved for the particu- 
lar participant, whereupon it is set to RESPONDED. 
The flow of Steps 432-444 is controlled by the 

PCS STATUS value for each participant. The pos- 

5 sible values are: 

(1) RESTART - indicates that a conversation 
with the participant is required. 

(2) CONNECTED - indicates that there was suc- 
cess in initializing a conversation with the par- 

io ticipant and causes the sending of the recovery 
request message to the participant. 

(3) RESPONDED - indicates that the sending of 
the recovery request message to the participant 
completed with a response from the participant. 

75 The response processing driver is invoked 
(Steps 438 - 439) to handle the response. 

(4) RETRY - Indicates failure In an attempt to 
connect (i.e. establish a conversation ) (Steps 
436-437) or send a message (Steps 440-441), or 

20 a mismatch of log names (Steps 440-441), or a 
mismatch of log names (Steps 440-441). After 
all PCS_STATUS flags for participants have 
progressed beyond the RESTART and CON- 
NECTED status, but there are some that have 
25 encountered communications failures (the re- 
mainder RESPONDED), the participation driver 
for the current sync point recovery suspends 
itself for a timed interval. When the suspension 
is completed, all PCS_STATUS of RETRY are 
30 changed to RESTART, which causes attempts 
to reconnect. 
The multiple event wait service (Step 433) is 
used to wait for completion of the first of any 
outstanding connect or send service requests, re- 
35 turning control to the participation driver with the 
path identifier and indication of success or failure. 
The recovery request sent to the participant (Steps 
434-435) includes the log name of the sending 
recovery facility 70 and the expected log name 
40 associated with the participant. The 
RCS_PARTICIPANTS_STATE is sent to permit a 
comparison with the participant's actual state, de- 
fining the appropriate recovery action. The timed 
wait service (Steps 442-443) is used to delay pro- 
45 cessing for a system-defined time interval before 
re-attempting unsuccessful initiation of a conversa- 
tion. This intentional delay is undertaken only after 
all participation paths have been driven and some 
failures have been encountered. Timed-wait com- 
50 pletion (Step 444) serves to restart suspended pro- 
cesses which causes another attempt to connect 
with the participant. After all participants have at- 
tained a RESPONDED status and completed pro- 
cessing by the response processing driver, the 
55 initiator response driver is invoked (step 445) to 
handle possible responses to the recovery process 
that represents the sync point initiator. 

FIG. 47 is a flowchart which illustrates the 
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processing required to process a response to a 
recovery request sent to a participant in a failed 
sync point. The response processing driver (Step 
318) is passed the sync point identifier, path iden- 
tifier, and the state received from the participant 5 
(Step 450). Then, the log name exchange response 
is processed (Step 451). If log names do not 
match, flow is returned to the participation driver 
(Step 317 FIG. 36) with an error that will cause a 
timed-wait retry to occur. ?o 

The sync point identifier is used to locate the 
sync point log entry; then the path identifier is used 
to locate the participant in the body of that sync 
point log entry, matching on 
SPL_RECOVERY__PATH_ID. Then the 75 

SPL PARTICIPANT STATE is updated with the 

state (Step 452). 

The RCS_JNITIATOR__RESPONSE__STATE 
is updated in some cases as a result of unexpected 
responses from participants, e.g. reflecting unilat- 20 
eral (heuristic) decisions (Step 453). Finally, the 
disconnection service is invoked to disconnect the 
current path (Step 454). 

FIG. 48 is a flowchart which illustrates the 
initiator response driver (Step 319). First, the In- 25 
itiator Response Driver is initiated (Step 460). 
When the RCS_RESPONSE_TO_JN ITIATOR is 
not set (decision block 461), it is not necessary to 
respond; therefore, it is only necessary to erase 
(Step 468) the sync point log entry. Response is 30 
also bypassed when (Step 462) there is no initiator 
to which to respond, i.e. when the recovery facility 
represents the first node in the sync point tree. 

Where there is no suspended recovery initiated 
recovery request (event illustrated in FIG. 41) to 35 
handle the response to the initiator and there is no 
existing conversation to which to respond to the 
initiator (Decision Step 479), then it is appropriate 
to attempt upstream communications with the re- 
covery facility that represents the initiator in order 40 
to notify it that the participant represented by the 
current recovery facility 70 is ready with a re- 
sponse (Step 464). This is most effective when 
there is a recovery facility for the initiator that is in 
timed suspension due to an earlier failed attempt to 45 
communicate with the local recovery facility 70, 
i.e., when the currently completed recovery re- 
sulted from a sync point failure that resulted in a 
failure of the local recovery facility 70 (event illus- 
trated in FIG. 43). This upstream communications . 50 
would have the effect of prematurely terminating 
the timed suspension and therefore minimizing the 
delay in resolving the sync point. FIG. 39. Step 374 
illustrates the action by the receiving recovery fa- 
cility (representing the initiator). 55 

If the SPL_SUSPENDED_PROCESS__ID is 
not defined and the RCS_PATH_JD is not set 
(decision block 479), the upstream communication 



is accomplished by finding the entry for the initiator 
in the body of the sync point log entry for the 
recovering sync point and using the 
SPL_RECOVERY_PATHJD that is associated 
with it to invoke the connection service for 
SPL_RECOVERY_PATH_ID. There is no retry 
when this attempt to initialize a conversation fails 
("no" decision path in step 464A) because it is an 
optional optimization to complete the conversation 
and notify the initiator. If the conversation is ini- 
tiated ("yes" decision path in Step 464A), a normal 
exchange of log names request is sent (Step 
464B), as illustrated in FIG. 38, steps 364 through 
367, then exit via decision step 477. In the case of 
connection not completed, invoke recovery termi- 
nation (Step 479). 

When the RCS_PATH_JD is not set (Decision 
Block 465), the response to the initiator Steps 466 
and 467) is bypassed. Otherwise, a normal re- 
sponse to the initiator is made, using the 
RCS_INITIATOR_RESPONSE_STATE (Step 
466) and the respond service (Step 467). In the 
case where RCS_RESPOND_TO_JNITIATOR or 

RCS ERASE LOG is on (Decision Block 477), 

the recovery termination function is invoked (Step 
469) before completion. 

FIG. 49 is a flowchart which illustrates the 
recovery termination logic (Step 306) which in- 
volves, after initiation in Step 470, cleaning up 
storage and control structures (Step 471), and ei- 
ther returning to the caller (END) or invoking the 
sub-process termination service to complete the 
current process (Step 472). 

ASYNCHRONOUS RESYNCHRONIZATION OF A 
COMMIT PROCEDURE 

When there is a failure during syncpoint pro- 
cessing in system 50, the following asynchronous 
resynchronization procedure and facilities are pro- 
vided to optimize the use of the participating ap- 
plications. This procedure avoids extended delays 
in executing the application which issued a commit 
because the application need not wait idly during 
resynchronization. Instead, as described in more 
detail below, the application can do other useful 
work while waiting for resynchronization. The sync- 
point manager and recovery facility execute this 
procedure provided either the application or a sys- 
tem detault requested it. The recovery facility 70 
supports asynchronous resynchronization 
(resynchronization-in-progress) and supports the 
new enhancements to the architected intersystem 
communications flows in support of this asynchro- 
nous resynchronization process. By way of exam- 
ple, the intersystem communications protocols are 
defined by IBM's System Network Architecture 
LU 6.2 Reference: Peer Protocols, SC31-6808, 
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Chapter 5.3 Presentation Services - Sync Point 
verbs. The architected intersystem communication 
enhancements within systems 50 include additional 
indications on such flows of Committed (last agent 
only), Forget, and Backout indicating resynch- 
ronization is in progress. In the data field defined 
for exchange log names between two different sys- 
tem recovery facilities during initial exchange or 
during ^synchronization, there is an indicator that 
the sender of the exchange log names supports 
resynchronization-in-progress. Exchange log names 
processing is described above in the section en- 
titled Log Name Exchange For Recovery of Pro- 
tected Resources. Both recovery facilities must 
support resynchronization-in-progress in order for 
the facility to be used. Finally, there is an indicator 
in the compare states data field that tells the part- 
ner that resynchronization is in progress. 

The foregoing section entitled Coordinated 
Sync Point Management of Protected Re- 
sources and FIG. 2, FIG. 54, FIG. 3, FIG. 4, and 
FIG. 5(a,b) describe and illustrate two partner ap- 
plications, 56A and 56D, their application environ- 
ments, their processing and successful commit 
processing. The present section will extend the 
above to include a description of a failure during 
commit processing which results in asynchronous 
resynchronization. It should be understood that the 
asynchronous resynchronization process described 
herein is also applicable when a protected con- 
versation is made between application partners on 
the same system and both are in different applica- 
tion environments, for example different virtual ma- 
chines of the enhanced version of the VM opera- 
tion system ("VM" is a trademark of IBM Corpora- 
tion of Armonk, N.Y.). It should also be noted that 
in other embodiments, application 56A or applica- 
tion 56D could execute in a different type of execu- 
tion environment 

As described in the section entitled Coordi- 
nated Sync Point Management of Protected 
Resources, application 56A starts application 56D 
via a protected conversation (FIG. 5A, Step 530). 
Protected conversation adapters 64A and 64D reg- 
ister with their respective syncpoint managers (FIG. 
5A, Step 532). Figure 50A expands the processing 
done next by application 56A (FIG. 5A, Step 533). 
As shown in FIG. 50A, application 56A issues to 
syncpoint manager 60A a 'set syncpoint options 
wait = no' call to indicate that application 56A does 
not desire to wait indefinitely for a synchronous 
resynchronization if there is a failure during sync- 
point processing (Step 900) and syncpoint man- 
ager 60A records the option (Step 902). Similar 
processing (Steps 904 and 906 of FIG. SOB) is 
done by application 56D after application 56A con- 
tacts it to do some work (FIG. 5A, Step 533). It 
should be noted that in the illustrated embodiment, 



the architected default is WAIT = yes. However, if 
desired, the default condition could be WAIT = no 
at system 50A and system 50D. In such cases, it is 
not necessary for application 56A and application 
5 56D to issue the 'set syncpoint options 1 call if they 
desired WAIT = no. 'Set syncpoint options' is a 
local value. Therefore, the value of the 'syncpoint 
options 1 in effect at the syncpoint manager where 
the failure is detected is the one used. 
10 Processing continues as described in the fore- 
going section entitled Coordinated Sync Point 
Management of Protected Resources and illus- 
trated in FIG. 2 and FIG. 5(a,b) steps 533A through 
step 546. Summarizing the above details, applica- 
75 tion 56A sends a request to application 56D over 
the protected conversation causing application 56D 
to update file 78D. Application 56D replies to ap- 
plication 56A causing application 56A to update 
files 78A and 78B. Application 56A issues a com- 
20 mit (Step 534 of FIG. 5A), causing syncpoint man- 
ager 60 A to call protected conversation adapter 
64A to send a phase one 'prepare' call to protected 
conversation adapter 64D. This causes application 
56D to receive a request asking it to issue a 
25 commit. Application 56D issues a commit (Step 
537) and syncpoint manager 60D does its phase 
one processing and calls protected conversation 
adapter 64D to reply 'request commit' to protected 
conversation adapter 64A. At this time syncpoint 
30 manager 60D's state is 'in doubt' (and is so noted 
on its log 72D). Protected conversation adapter 
64A replies 'request commit' to syncpoint manager 
60A. Since its other resources also replied 'request 
commit', syncpoint manager 60A's state is now 
35 'committed' and writes this state to its log. 72A. 
Syncpoint manager 60A now contacts its registered 
resources with the phase two decision of 
'committed' (FIG. 5b, Step 545). Protected con- 
versation adapter 64A then sends the phase two 
40 decision of 'committed' to protected conversation 
adapter 64D (FIG. 5b, Step 546). However, during 
this processing protected conversation adapter 64A 
discovers a failure such that the path between 
system 50A and system SOD for the protected 
45 conversation between application 56A and applica- 
tion 56D is no longer available. Protected conversa- 
tion adapter 64A replies 'resource failure' to sync- 
point manager 60A. This is an interruption in sync- 
point manager 60A's processing (FIG. 5b, Step 
so 550), causing syncpoint manager 60A to start re- 
covery processing (FIG. 5b, Step 557). 

The recovery procedures are defined by the 
two-phase commit example being used: In the illus- 
trated embodiment, the two-phase commit example 
55 is the one used in the section entitled Coordi- 
nated Sync Point Management of Protected 
Resources. Recovery processing occurs if a pro- 
tected resource adapter replies abnormally to the 
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syncpoint manager's phase one or phase two call. 
The abnormal reply is the result of a resource 
failure which may be caused by a system failure, 
path failure, program failure or resource manager 
failure. Recovery is conducted independently for 
each failed protected resource for which it is re- 
quired. Recovery has the following purposes: 

1. to place protected resources in a consistent 
state if possible; if not possible, to notify the 
operators at the system or, in the case of a 
failed protected conversation, systems that de- 
tected the damage; 

2. to unlock locked resources in order to free 
them for other uses; and 

3. to update the recovery facility log, showing 
that no more syncpoint work is needed for ail 
protected resources, for that LUWID. 

The steps involved in recovery, i.e., resynchr 
ronization, include the following: 

1. The data structures from the recovery facility 
log records representing the status of the sync- 
point operation are restored if the system failed 
where this recovery facility operates. From these 
data structures, the recovery facility can (in oth- 
er embodiments the recovery facility might be 
called the syncpoint manager because one fa- 
cility performs both syncpoint and recovery pro- 
cessing) determine the resources for which it is 
responsible for initiating recovery. If the recov- 
ery occurs without a system failure, it is not 
necessary to restore information from the log 
because the data structures written during sync- 
point used by the recovery facility are still intact. 

2. A program in the recovery facility that is 
responsible for initiating recovery is started. For 
the conversation example used for protected 
conversations in this illustrated embodiment this 
means: 

for protected conversations, establishing a 
non-protected conversation of a type requiring 
confirmation with a partner recovery program 
running in the recovery facility in the system 
originally involved in the syncpoint (this may 
require a new path between the two systems to 
be activated; 

exchanging log names to verify that the 
partner has the appropriate memory of the 
LUWID; 

comparing and adjusting the state of the 
LUWID (i.e., commit or backout) at both part- 
ners; and 

erasing recovery facility log entries and noti- 
fying the operators at both partners of the out- 
come when the recovery completes. 

3. For other resource managers participating in 
the two-phase commit processing, a similar 
method of recovery is defined. In general, recov- 
ery processing for protected resource managers 



that do not distribute are defined by operating 
systems implementing syncpoint support. Re- 
covery processing for protected conversations 
are defined by an intersystem communications 
5 architecture. By way of example, the former can 
be of a type described by the enhanced version 
of the VM operating system; ("VM" is a trade- 
mark of IBM Corporation of Armonk, N.Y.) the 
latter can be of a type defined in part by Sys- 
w tern Network Architecture LU 6.2 Reference: 
Peer Protocols, SC31-6808 Chapter 5.3 Pre- 
sentation Services - Sync Point verbs. 
Next, syncpoint manager 60A calls recovery 
facility 70A with the identifier of the resource that 
15 failed (in this example the resource would be pro- 
tected conversation 64A) and the LUWID being 
processed. Recovery facility 70A finds the log en- 
try for the LUWID and the entry for protected 
conversation 64A (FIG. 4, Step 518). Recovery 
20 facility 70A determines the recovery decision from 
the state information in the entry (Step 519). Based 
on the processing described above the decision is 
'Commit'. Recovery facility 70A knows the resource 
to be recovered is a protected conversation and 
25 starts a recovery process which is an application 
whose processing is described by the recovery 
methods architected for the conversation and two- 
phase commit paradigm being used. That recovery 
process starts a non-protected conversation for a 
30 partner recovery process in recovery facility 70D 
on system 50D (Step 520). The recovery attempt 
fails because a conversation cannot be started be- 
tween the two systems (decision block 521, the No 
branch) due to a path failure. Recovery facility 70A 
35 then checks the log entry to see whether applica- 
tion 56A had requested WAIT = No meaning recov- 
ery facility 70A could return to syncpoint manager 
60A before recovery was complete. Recovery fa- 
cility 70A could then complete recovery later asyn- 
40 chronously from application 56A (Step 524). This 
information was written by syncpoint manager 60A 
during its phase one log write. As described above, 
application 56A issued a *set syncpoint options 
wait = no' call. Therefore recovery facility 70A re- 
45 turns to syncpoint manager 60A with the intent of 
the recovery, i.e., commit, and an indication that 
^synchronization (recovery) is still in progress 
(Step 526). Because syncpoint manager 60A had 
already heard •forget' from its other protected re- 
50 sources (FIG. 5b, Step 545A), it updates the value 
of the LUWID by one and returns to application 
56A with a return code of "RC = 
OK.LUW_OUTCOME_PENDING" which indicates 
the intended outcome, Commit, and that not all 
55 resources have been committed (FIG. 5a. Step 
558). This means that the commit processing will 
be completed asynchronously to application 56A. 
Thus, application 56A can then continue processing 
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other work and not waste time waiting for resynch- 
ronization. 

Recovery facility 70A repeatedly tries to suc- 
cessfully complete recovery for protected conver- 
sation adapter 64A with recovery facility 70D on 
system 50D (FIG. 4, Step 527). When recovery is 
started and finally completed (decision block 521, 
YES branch) both recovery facility 70A and recov- 
ery facility 70D write operator messages stating 
that the recovery had started and that it had suc- 
cessfully completed (Step 522). Syncpoint man- 
ager 60D had also learned of the failed conversa- 
tion through its registered resource, protected con- 
versation adapter 64D. It too had contacted its 
recovery facility 70D, with the identifier of the failed 
resource, in this case protected conversation 64D, 
and the LUWID. Based on the syncpoint manager 
state of "in doubt", recovery facility 70D knew it 
had to wait to be contacted for recovery by recov- 
ery facility 70A. When the recovery finally com- 
pletes (decision block 523, YES branch), recovery 
facility 70D returns to syncpoint manager 60D a 
decision of commit (Step 523A). Syncpoint man- 
ager 60D then performs its phase two processing. 
Because of the protected conversation breakage, 
syncpoint manager 60D subsequently gets a new 
unique LUWID. It then returns to application 56D 
with an outcome of Commit. Application 56D can 
not perform its processing. It should be noted that 
in the previous example, there could have been a 
failure with file manager 63A in step 545A instead 
of with the protected conversation, represented to 
syncpoint manager 60A by protected conversation 
adapter 64A. In this alternate case, recovery facility 
70A would initiate recovery with file manager 63A 
instead of recovery facility 70D based on the re- 
covery methods for non-protected conversations 
defined by the operating system. 

In FIG. 5(a,b), application 56A (and thus sync- 
point manager 60A) was the initiator of the commit 
request. However, FIG. 51 illustrates another exam- 
ple in which another application 56H at System 
50H initiated a commit (Step 700) instead of ap- 
plication 56A. Application 56H is running in an 
application environment that can be similar to or 
different than the one where application 56A is 
running; however, both systems and application 
environments support the aforesaid communica- 
tions and two-phase commit procedures. System 
50A and System 50D are the same as in FIG. 2. 
For purposes of the example illustrated in FIG. 51, 
(and FIGS. 52 and 53 which follow), application 
56H issued a commit request (SYNCPT) to sync- 
point manager 60H within System 50H which com- 
mit request involved resources in system 50H, sys- 
tem 50A, and system 50D. In response to the 
commit request, syncpoint manager 60H calls its 
registered resource protected conversation adapter 



64H with a phase on 'prepare' call. Protected con- 
versation adapter 64H then sends the intersystem 
architected 'prepare' call to protected conversation 
adapter 64B within System 50A (Step 701). As 
5 noted above, the 'prepare signal is part of the first 
phase of the two-phase commit procedure. Next, 
protected conversation adapter 64B gives applica- 
tion 56A a notification of "Take Syncpoint" (Step 
704), and in response, application 56A issues a 
to commit request (SYNCPT) to syncpoint manager 
60A (Step 706). Next, syncpoint manager 60A calls 
protected conversation adapter 64A with a phase 
one 'prepare* call. Protected conversation adapter 
64A sends an architected intersystem prepare call 
75 to protected conversation adapter 64D in System 
50D (Step 708). In response, protected conversa- 
tion adapter 64D gives application 56D a notifica- 
tion of "Take Syncpoint" (Step 710). In response, 
application 56D issues a commit (SYNCPT) request 
20 to syncpoint manager 60D (Step 712). Syncpoint 
manager 60D issues a phase one 'prepare' call to 
all its registered resources. When all the resources 
accessed by syncpoint manager 60D are ready to 
commit, syncpoint manager 60D calls protected 
25 conversation adapter 64D with a reply of 'request 
commit'. Protected conversation adapter 64D 
sends an architected intersystem 'request commit' 
call to the initiator of the commit request, in this 
case protected conversation adapter 64A which 
30 replies to syncpoint manager 60A 'request commit* 
(Step 714). After syncpoint manager 60A receives 
this request and notification that all of its resources 
are ready, syncpoint manager 60A replies to pro- 
tected conversation adapter 64B with 'request com- 
35 mit'. Protected conversation adapter 64B sends an 
architected intersystem 'request commit' call to the 
initiator of the commit request, in this case the 
initiating protected conversation adapter 64H and 
syncpoint manager (60H (Step 716). After receiving 
40 this reply from protected conversation adapter 64H 
on behalf of syncpoint manager 60A and notifica- 
tion that all of syncpoint manager 60H's resources 
are ready, syncpoint manager 60H's phase two 
decision is commit. Syncpoint manager 60H calls 
45 all resources with a phase two decision of 'commit'. 
When protected conversation adapter 64H is called 
it sends an archi- tected intersystem 'commit' call 
to protected conversation adapter 64B which in 
turn replies 'committed' to syncpoint manager 60A 
so _ which becomes its phase two decision (Step 718). 

So far, there have been no problems in im- 
plementing the two-phase commit procedure. Also, 
it should be noted that after each application issues 
the commit request to the respective syncpoint 
55 manager in Steps 700. 706 and 712, the respective 
syncpoint managers logs the phase one informa- 
tion and state into the respective recovery facility 
logs. Similarly, when each of the syncpoint man- 
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agers 60A and 60D receives the notifications from 
its associated resources that all resources are 
ready, they log 'in doubt* in their respective recov- 
ery facility log entries. If one or more resources 
cannot commit, no log entry is made, but backout 5 
processing is completed before replying 'backout 1 
to its upstream initiator. Similarly, when syncpoint 
manager 60H receives 'request commit' for all its 
registered resources, it writes the decision of 
'commit' in its recovery facility log. When sync- 10 
point managers 60A and 60D, respectively, receive 
the commit decision, they too will write the commit 
decision in their respective recovery facility logos 
before contacting their registered resources. 

Next, syncpoint manager 60A calls all its regis- 75 
tered resources with the phase two 'commit' de- 
cision. When syncpoint manager 60A calls pro- 
tected conversation adapter 64A with the 'commit' 
call, protected conversation adapter 64A attempts 
to send an architected intersystem 'committed' call 20 
to protected conversation adapter 64D which in 
turn should reply committed to syncpoint manager 
60D. In the illustrated example, however, this trans- 
mission is unsuccessful (Step 720) due to a failure 
in the conversation path. In response to this failure, 25 
syncpoint manager 60A contacts recovery facility 
70A for recovery processing for this LUWID and 
protected conversation. As described above, recov- 
ery facility 70A tries once to perform recovery with 
recovery facility 70D (Step 722). This attempt is 30 
also unsuccessful in this example due to the per- 
sistence of the communication path failure. Next, 
recovery facility 70A reads the log entry and learns 
that asynchronous ^synchronization is required. 
Recovery facility 70A then notifies syncpoint man- 35 
ager 60A of the failed attempt to recovery and that 
recovery will continue asynchronously. Syncpoint 
manager 60A then caJIs protected resource adapter 
64B with 'forget, resynchronization-in-progress 
(RIP)'. Protected conversation adapter 64B sends 40 
an architected intersystem 'forget, RIP' call to pro- 
tected conversation adapter 64H which replies 
'forget, RIP' to syncpoint manager 60H (Step 726). 
Syncpoint manager 60A then gives application 56A 
a return code, "RC = 45 

OK.LUW_OUTCOME_PENDING", to advise ap- 
plication 56A the intent of Commit and that the 
commit processing will be completed asynchro- 
nously (Step 724). The "Forget RIP" notification of 
Step 726 serves as an acknowledgement to Step 50 
718 and causes syncpoint manager 60H to write a 
state of 'forget' in its recovery facility log for the 
syncpoint information relating to the syncpoint of 
Step 700 because two-phase commit processing is 
now complete for the commit requested by ap- 55 
plication 56H. Syncpoint manager 60H, upon re- 
ceiving the "Forget, RIP" indication from its pro- 
tected conversation adapter 64H (and assuming it 



had heard from all other resources involved in the 
commit) can return to application 56H with a return 
code. "RC = OK.LUW_OUTCOME_ PENDING", 
advising application 56H of the intent of Commit 
and that the commit processing will be completed 
asynchronously (Step 728). 

Recovery facility 70A periodically attempts to 
execute recovery processing with recovery facility 
70D on system SOD and to simultaneously order 
the commit (Step 730). As discussed above, when 
recovery is complete, recovery facility 70D replies 
to syncpoint manager 60D with a phase two de- 
cision of 'commit'. Syncpoint manager 60D will 
complete its phase two processing and return to 
application 56D with a return code, "RC = 

OK.ALL AGREED", meaning the commit request 

completed successfully (Step 732). Applications 
56H, 56A, and 56D can all continue with other 
processing. It should be noted that when recovery 
processing takes place between recovery facility 
70A and recovery facility 70D, messages are sent 
to the operator consoles indicating recovery is 
starting and the outcome of the processing. 

It should be noted also that when syncpoint 
manager 60A received the "FAILED ATTEMPT TO 
RESYNC" notification from recovery facility 70A, 
syncpoint manager 60A updates the state for the 
LUWID to 'Forget.RIP' in the log entry in log 72A. 
System 50A will later write a state of 'forget' for 
this LUWID when the next normal flow arrives over 
the conversation path between System 50A and 
System 50H which has or had carried the protected 
conversation involved in this LUWID. This is an 
"implied forget" operation. If there is a failure such 
that the conversation path fails between System 
50A and System 50H (over which the protected 
conversation flowed that was involved in the com- 
mit procedure which received the 
resynchronization-in-progress notification) after 
syncpoint manager 60A writes the state of 
'Forget.RIP' and before the "implied forget" is re- 
ceived, the log entry for the LUWID at System 50A 
will be erased by normal recovery procedures as 
defined by the two-phase commit paradigm being 
used. This would involve, however, that new 
resynchronization-in-progress indicators be sent in 
the compare states data flow as defined earlier. It 
should also be noted that if the "implied forget" is 
received causing System 50A to write a state of 
'forget' on recovery facility log 72A, recovery fa- 
cility 70A will not allow the recovery record to 
really be forgotten until recovery is complete with 
recovery facility 70D. 

It should also be noted that there is a migration 
path between syncpoint managers such that sync- 
point managers which support the foregoing asyn- 
chronous resynchronization (resynchronization-in- 
progress) function can communicate with other 
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syncpoint managers that do not. When the systems 
that support syncpoint processing originally com- 
municate with each other, it is determined in the 
initial capabilities exchange as defined by the com- 
munications architecture and the two-phase commit 
procedures used by both system sif they support 
the foregoing resynchronization-in-progress func- 
tion. If the initiator of the commit request, in the 
above example from FIG. 51, syncpoint manager 
60H, does not support resynchronization-in- 
progress, the cascaded initiator (the syncpoint 
manager that receives the commit request, in the 
above example, syncpoint manager 60A) will send 
back to the syncpoint manager who initiated the 
commit request (in the above example syncpoint 
manager 60H) the intent of a syncpoint request 
(either commit or backout) and not an indication 
that ^synchronization will take place later asyn- 
chronously. The local application, where the outage 
took place (in the above example, application 56A) 
and where the syncpoint manager supports 
resynchronization-in-progress (in the above exam- 
ple, syncpoint manager 60A), will receive this 
resynchronization-in-progress notification. 

FIG. 52 illustrates the resynchronization-in- 
progress function in the event that syncpoint man- 
ager 60H issues a backout as described in more 
detail below. Steps 700-716 are the same as in 
FIG. 51. However, after receipt of the 'request 
commit 1 reply from syncpoint manger 60A via pro- 
tected conversation adapter 64H in Step 716, sync- 
point manager 60H decides to back out because 
one or more of its protected resources are not 
ready. Then, syncpoint manager 60H calls its reg- 
istered resources with a phase two decision of 
'backout'. The 'backout 1 decision is given to sync- 
point manager 60A (protected conversation adapter 
64H sends an architected intersystem backout call 
to protected conversation adapter 64B who replies 
'backout' to syncpoint manager 60A) (Step 740). 
Syncpoint manager 60A calls its registered re- 
sources with a phase two decision of 'backout'. 
Protected conversation adapter 65A attempts to 
send an intersystem backout call to syncpoint man- 
ager 60D via protected conversation adapter 64D in 
Step 742. However in the example, Step 742 fails 
due to a communication path failure or other type 
of failure. In response, syncpoint manager 60A 
calls recovery facility 70A with the LUWID and 
failed resource identifier to perform recovery pro- 
cessing with recovery facility 70D on System 50D 
in Step 744. However, in the illustrated example, 
this recovery attempt also fails. Recovery facility 
70A replies to syncpoint manager 60A that the 
recovery attempt failed, but that it will complete 
recovery processing asynchronously. Having heard 
from its other protected resources, syncpoint man- 
ager 60A writes a state of 'backout, rip' on its 



recovery facility log 72A. Syncpoint manager 60A 
then calls protected conversation adapter 64B with 
a reply of 'backoutrip'. Based on the architected 
intersystem backout call, protected conversation 
5 adapter 64B sends an error reply to the original 
phase two 'backout' call from protected conversa- 
tion adapter (Step 748). It then sends an architec- 
ted intersystem 'backout, rip' call to protected con- 
versation adapter 64H (Step 750). Having received 
w the 'backoutrip' indication, protected conversation 
adapter 64H sends an architected intersystem ac- 
knowledgement (Step 752) and replies 'backout,rip' 
to syncpoint manager 60H (Step 752). Having 
heard from its other resources, syncpoint manager 
75 60H returns to application 56H with a return code, 

"RC = Backout, LUW OUTCOME PENDING", 

which notifies it that backout is pending and to 
advise application 56H that it is free to perform 
other useful work (Step 754). When protected con- 
20 versation adapter 64B gets an acknowledgement to 
the 'backoutrip* call from protected conversation 
adapter 64H (response to steps 748 and 750) it 
replies 'ok' to syncpoint manager 60A. Syncpoint 
manager 60A then writes a state of 'forget' in the 
25 log entry for this LUWID in recovery facility log 72A 
and returns to application 56 A with a return code, 
"RC = Backout, LUW_OUTCOME_PENDING n 
(Step 746), which means that the intended result of 
the commit request is backout, but all resources 
30 have not backed out. Application 56A can then 
continue with its processing. The LUWID entry in 
recovery facility log 72A will be forgotten by Sys- 
tem 50A as an "implied forget" which was de- 
scribed above. When 'forget' is written, if the failed 
35 resource in the LUWID has not been recovered yet, 
the LUWID entry will not be really forgotten until 
recovery takes place. Meanwhile, recovery facility 
70 A continues to attempt to recover with recovery 
facility 70D in system 50D asynchronously (Step 
40 756). When recovery completes, syncpoint man- 
ager 60D then returns to application 56D with a 

return code of "RC = BACKOUT.ALL AGREED" 

which means all resources have backed out (Step 
758). Applications 56H, 56A, and 56D can all con- 
45 tinue with other processing. It should be noted that 
when recovery processing takes place between 
recovery facility 70A and recovery facility 70D, 
messages are sent to the operator consoles in- 
dicating recovery is starting and the outcome of the 
50 processing. 

FIG. 53 illustrates the resynchronization-in- 
progress function in the event that syncpoint man- 
ager 60A issues a backout as described in more 
detail below. Steps 700-714 are the same as in 
56 FIG. 52. However, after receipt of the "request 
commit' reply in Step 714, syncpoint manager 60A 
calls its registered resources with a phase two call 
of 'backout' because one or more of the resources 
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associated with syncpoint manager 60A cannot 
commit (Step 759). Protected conversation adapter 
64A attempts to send an architected intersystem 
f backout' call to protected conversation adapter 
64D (Step 760). However, as illustrated in Step 
760, the 'backout call is not received by protected 
conversation adapter 64D due to a communication 
path failure or other failure. Syncpoint manager 
60A calls recovery facility 70A with the LUWID and 
failed resource identifier asking it to perform recov- 
ery processing. Recovery facility 70A tries to per- 
form recovery processing with recovery facility 70D 
in system SOD (Step 744). Step 744 also fails 
because the communication path failure persists, 
and consequently, syncpoint manager 60A trans- 
mits the signal of step 746 described above in 
reference to FIG. 52. Step 750-758 are also the 
same as in FIG. 52. 

FIG. 53A illustrates the resynchronization-in- 
progress function in the event that syncpoint man- 
ager 60A issues a backout because of a different 
failure as described in more detail below. Steps 
700-706 are the same as in FIG. 52. However, after 
receipt of the commit request in Step 706, sync- 
point manager 60A calls its registered resources 
with a phase one call of 'prepare 1 . Protected con- 
versation adapter 64A attempts to send an ar- 
chitected intersystem 'prepare* call to protected 
conversation adapter 64D (Step 708a). However, as 
illustrated in Step 708a, the 'prepare' call is not 
received by protected conversation adapter 64D 
due to a communication path failure or other fail- 
ure. Syncpoint manager 60A calls its local regis- 
tered resource with a phase two call of backout 
(Step 763). Syncpoint manager 60A then calls re- 
covery facility 70A with the LUWID and failed re- 
source identifier asking it to perform recovery pro- 
cessing. Recovery facility 70A tries to perform re- 
covery processing with recovery facility 70D in 
system SOD (Step 744). Step 744 also fails be- 
cause the communication path failure persists, and 
consequently, syncpoint manager 60A transmits 
the signal of step 746 described above in reference 
to FIG. 52. Step 750-756 are also the same as in 
FIG. 52. Asynchronously to the processing being 
done by syncpoint manager 60A, application 56D 
receives a path failure indication on its previously 
established (when application 56A initiated applica- 
tion 56D) protected conversation with application 
56A (Step 761). This path failure prevented pro- 
tected conversation adapter 64D from receiving the 
prepare call from protected conversation adapter 
64A. Because the path failure was on a protected 
conversation, application 56D must issue a backout 
request. Application 56D issues a backout request 
(Step 762) and eventually receives a return code 
that indicates all registered resources are backed 
out (Step 764). At this point, applications 56H, 56A, 



and 56D can ail continue with other processing. 
Meanwhile, recovery facility 70 A continues to at- 
tempt to recover with recovery facility 70D in sys- 
tem 50D asynchronously (Step 756). It should be 

5 noted that when recovery processing takes place 
between recovery facility 70A and recovery facility 
70D, messages are sent to the operator consoles 
indicating recovery is starting and the outcome of 
the processing. 

10 Based on the foregoing, processes and sys- 
tems embodying the present invention have been 
disclosed. However, numerous modifications and 
substitutions may be made without deviating from 
the scope of the invention. Therefore, the invention 

75 has been disclosed by way of illustration and not 
limitation, and reference should be made to the 
following claims to determine the scope of the 
invention. 

The following is a partial glossary of terms. 

20 

Application 

User or service program(s) or a work distribu- 
tion function integrated with a resource manager, 
25 that execute in an execution environment and can 
issue one or more of the following: commit, back 
out or work request. 

Execution Environment 

30 

Any computing means for executing applica- 
tions, system facilities (recovery facility, commu- 
nication facility, etc.), resource managers, and/or 
other programs in virtual machine, personal com- 
35 puter, work station, mini computer, mainframe com- 
puter, and/or other type of computers. 

Protected Conversation 

40 A conversation that is subject to any form of 
synchronization point processing or protective 
commit or back out procedure. 

Protected Resource 

45 

A resource that is subject to any form of syn- 
chronization point processing or other protective 
commit or back out procedure. 

so Recovery Facility 

A facility that has a responsibility for recovery 
of a failed synchronization point or other commit or 
back out procedure. 

55 

Two-Phase Commit Procedure 

A procedure for coordinating and/or synchro- 
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nizing a commit or back out of updates and/or a 
protected conversation. Usually, the two phase 
commit procedure is used to atomically commit or 
back out a plurality of resources or a single re- 
source via a protected conversation. By way of 5 
example, the two phase commit procedure can 
include a polling or prepare phase and a back out 
or commit phase. 

Claims io 

1. A computer system comprising: 

sync point manager means for coordinat- 
ing a commit procedure involving a first re- 15 
source manager and a second resource man- 
ager, said first resource manager being a first 
type, and said second resource manager being 
a second type; 

20 

means, coupled to said sync point man- 
ager means, for running an application which 
initiates said commit procedure; and wherein 

said sync point manager means includes 25 
means for receiving notification of a failure or 
failures relating to said first and second re- 
source managers that prevent completion of 
said commit procedure, and identification of 
the resource manager or resource managers 30 
associated with the failure or failures, and 

said sync point manager means includes 
means for sending to said application, a failure 
notification after receipt of notification of a fail- 35 
ure or failures relating to either or both of said 
resource managers, and sending to said ap- 
plication the identification of the resource man- 
ager or resources managers associated with 
the failure or failures. 40 

2. A computer system as set forth in claim 1 
wherein said sync point manager means in- 
cludes means for receiving cause of failure 
information for each failure. 45 

3. A computer system as set forth in claim 1 or 2 
wherein said sync point manager means in- 
cludes means for sending said cause of failure 
information to said application upon request by so 
said application. 

4. A computer system as set forth in claims 1, 2 
or 3 wherein said sync point manager means 
automatically sends said error notification to 55 
said application upon receipt of said error no- 
tification. 



5. A computer system as set forth in claim 4 
wherein said sync point manager means sends 
a single error notification to said application 
when there are failures associated with both of 
said resource managers that prevent comple- 
tion of said commit procedure. 

6. A computer system as set forth in anyone of 
claims 1 to 5 wherein said first resource man- 
ager is a shared file system. 

7. A computer system as set forth in anyone of 
claims 1 to 6 wherein said second resource 
manager is a SQL/DS system. 

a A computer system as set forth in claim 1 or 
anyone of claims 2 to 7 further comprising: 

first resource adapter means, connected 
between said first resource manager and said 
sync point manager means, for sending said 
error notification and cause of failure informa- 
tion to said sync point manager means when 
communication is lost from said first resource 
adapter to said first resource manager; and 

second resource adapter means, connect- 
ed between said second resource manager 
and said sync point manager means, for send- 
ing said failure notification and said cause of 
failure information to said sync point manager 
means when communication is lost from said 
second resource adapter to said second re- 
source manager. 

9. A computer system as set forth in claim 8 
further comprising a reformatting function 
stored in said first resource adapter means to 
reformat said cause of error information to a 
form compatible with said application, and 
wherein said first resource adapter means in- 
cludes means for transmitting said reformatting 
function to said application. 

10. A computer system as set forth in claim 2 or 
anyone of claims 3 to 9 wherein said second 
resource manager includes means for reading 
said cause of failure information from said 
sync point manager means, changing the for- 
mat of said cause of failure information to a 
format which is compatible with said applica- 
tion, and transmitting the reformatted informa- 
tion to said application. 

11. A computer system as set forth in anyone of 
the preceding claims further comprising: 

a protected conversation adapter; and 
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wherein 

said sync point manager means is also 
coupled to said protected conversation adapter 
and coordinates a two-phase commit proce- 
dure involving said protected conversation 
adapter; 

said sync point manager means includes 
means for receiving notification of a failure 
relating to said protected conversation adapter 
that prevents completion of said commit proce- 
dure, and identification of said protected con- 
versation adapter associated with the failure; 
and 

said sync point means includes means for 
sending a failure notification to said application 
program when there is a failure associated with 
said protected conversation adapter. 

1Z A computer system as set forth in claim 11 
wherein there is a single error notification sent 
to said application when there is a failure asso- 
ciated with said protected conversation adapter 
and at least one of said resource managers. 

13. A process for resource recovery, said process 
comprising the steps of: 

coupling a first resource manager of a first 
type to a sync point manager; 

coupling a second resource manager of a 
second type to said sync point manager; 

initiating a commit procedure involving 
said first and second resource managers, said 
sync point manager controlling said commit 
procedure; 

sending a notification of a failure in said 
commit procedure from either or both of said 
resource managers to said sync point man- 
ager; 

sending an identification of the failed re- 
source manager or resource managers to said 
sync point manager; 

sending a failure notification from said 
sync point manager to an application asso- 
ciated with said commit procedure; and 

sending an identification of said failed re- 
source manager or failed resource managers 
to said application. 



14. A process as set forth in claim 13 further 
comprising the step of sending cause of failure 
information to said application. 

5 15. A process as set forth in claim 14 further 
comprising the steps of: 

sending said cause of failure information 
from the failed resource to said sync point 
w manager before the step of sending said cause 

of failure information to said application; and 

reading by the failed resource manager of 
said cause of failure information from said 
75 sync point manager; and 

changing by said failed resource manager 
the format of said cause of failure information 
to a format which is compatible with said ap- 
20 plication. 

16. A computer program product having a com- 
puter readable medium, said computer pro- 
gram product comprising: 

25 

means for implementing a commit proce- 
dure involving first and second resource man- 
agers; 

30 means for receiving notification of a failure 

or failures relating to said first and second 
resource managers that prevent completion of 
said commit procedure, and receiving iden- 
tification of the resource manager or resource 

35 managers associated with the failure or fail- 

ures; and 

means linked to the receiving means, for 
sending a failure notification for either or both 
40 of said resource managers; and 

means linked to the receiving means, for 
sending an identification of the failed resource 
manager or resource managers. 

45 

17. A computer program product as set forth in 
claim 16 wherein: 

said computer program product defines at 
50 least in part an operating system which con- 

trols the execution of an application; 

the failure notification sending means 
sends said failure notification to said applica- 
55 tion; and 

the identification sending means sends 
said identification to said application. 
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18. A computer program product as set forth in 
claims 16 or 17 further comprising: 

sync point manager means for controlling 
said commit procedure; and 5 

wherein said failure notification is sent to 
said sync point manager and said identification 
of the failed resource or resources is sent to 
said sync point manager; and io 

said sync point manager sends said failure 
notification and said failed resource manager 
or resource managers identification to said ap- 
plication. 15 

19. A computer program product as set forth in 
claims 16, 17 or 18 further comprising means 
for receiving cause of failure information from 
both of said resource managers. 20 

20. A computer program product as set forth in 
claim 19 further comprising means for sending 
said cause of failure information back to the 
failed resource manager. 25 
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